Monday, April 27, 2009

At work, we use Informatica. In fact, my team is on the bleeding edge of Informatica - we're using 8.6.1. Which is so new, it didn't even work right when the vendor tried to install it (it wasn't compatible with the Data Quality piece), and they had to reconfigure it (they being Informatica technical support), in order to get it to work. That's how on the bleeding edge we are. You can't get much more bleeding edge than that.

Since Informatica will be a topic du jour for this blog, I'll start out by giving a few Informatica tips. First off:
Don't believe that Informatica can do everything. Quite simply - it can not. For example, Informatica is exceedingly bad at deleting data. Really, really quite bad, in fact. In a word - atrocious. Don't do it. At work, we have a workflow that runs "daily" which attempts to purge old data. At last count, said job had been running for 9 days. That means, on the odd chance that it actually finishes, when it starts up again, it's going to be 9 days behind.

A co-worker once gave me a bit of wisdom. He said "A daily job should finish in under 24 hours." He was a smart guy.

Informatica also isn't great at dealing with file manipulation. It's quite clumsy in that space. You're better off using DOS Batch files (I cringe everytime I say that), or (if you're more lucky than I) UNIX kornshell scripts (or whatever flavor of script you prefer). Much better option, and much more flexible and powerful.

Struggling with a problem in Informatica? Drop me a line - I won't guarantee that I know the answer, but I'll give it the ol' college try - which means I'll just steal someone else's answer from the internet.

Until next time...

4 comments:

  1. How much data are you deleting? Have you tried pushing the task down to the database? If something is taking 9 days, then it is hard to believe that the cause is just the ETL platform.

    ReplyDelete
  2. I'm not that familiar with the job in question, but I agree that a stored procedure (or something at the DB level) is a much better solution than what they have. The real issue is that this particular table apparently generates hundreds of thousands of rows a day.

    ReplyDelete
  3. In my opinion, it should not be a problem for the tool even with hundreds of thousands of rows. If it is taking time to generate the rows for delete, then it is a tuning issue which may have many variables including sourcing database, network, mapping logic etc. If it takes that amount of time to write the data it is a database configuration issue and/or network related. Is this one of your problem mappings that needs tuning/review?

    ReplyDelete
  4. Honestly, it's not a job I've ever really looked at, and it's not something anyone considers overly critical. I'm pretty sure someone has been tasked to rewrite it, which is probably a good thing.

    ReplyDelete