Friday, May 1, 2009

Informatica and the little things

I've always felt like it's the little things in life that matter. It's always the small things that either bite you in the ass, or - as they say, life's little pleasures. There doesn't seem to be any middle ground when it comes to little things. Take my 3 year old, for example. A couple of weeks ago, I asked her if Daddy was "bootylicious". She responded that she would not like to eat my booty. I can't blame her there. I wouldn't either.

Anyway, like butts, there are some things in Informatica that kind of stink, like Worklets. I don't like worklets. I never have. I'm not even sure I know what the point is. Early on, when I took over the system we have, the previous developers lumped every task together into logical groupings that they put into worklets. So, the workflow was literaly one worklet after another. And each worklet was single-threaded. There was nothing running in parallel. And because it was all in worklets, you couldn't restart the workflow if it died. So, maintaining the thing was a nightmare.

One of the first things I did was to rebuild the entire workflow. I scrapped every worklet and put everything into one gigantic workflow. Oh sure, this was complicated, but it meant that I could load multiple things at a time, and do completely unrelated things in parallel. It also meant that if something died, the rest of the stuff would continue on until the dependencies came back together again, but I knew I could always "restart workflow from task" at that point. Life became good. I also shaved about 4 hours off the total run time. Good stuff.

Today, the only worklet I have at all is a worklet which contains all the match/consolidate stuff we're doing with IDQ, and even then I only really even use that in development. The Informatica rep we hired to code it all for us put it all into a worklet, and I frowned, "What about restartability?", I asked - and he agreed it was a problem, so he took it out of the worklet.

My advice? Stay away from worklets.

There are other little things like this in Informatica that can make life hard or easy. Take Sequence Generators, for example. The default cached values of a sequence generator is 1,000. Which means if you write 5 rows into a reference table a day, you're going to end up with key values like 1,2,3,4,5,1000,1001,1002,1003,1004,2001,2002,2003,2004... something like that. Kind of ugly. Set it to 10 or 100 instead. Likewise, if you're writing a hundred thousand rows in a table a day, having a value like 1,000 is inefficient. Might as well make it 10,000. That's 90 fewer generations.

Little things.

Another little thing that new developers will often overlook is the "save session log for these runs" option under the "config object" tab of a session. I always set this to 14 - that saves two weeks of logs. Leaving this to be zero (the default) means you'll only always store just 1 version of your log - the current one. Bad idea. Better yet, edit the default_session_config (under tasks->session configuration) and change it permanently there so you don't have to think about it (and worse, forget about it) ever again.

No comments:

Post a Comment