Great IT Disasters: 2010

Friday, September 10, 2010

Random input Informatica

A client of mine posed this interesting dilemna to me the other day.

"I have a project where I need to build a simple but reliable text file importer to drop records into [a table]. But it needs to be configurable so that the client can map where the fields go."

At first, I was stumped - No, I don't think Informatica can do that. That's very much outside the boundaries of what Informatica is designed to do. Obviously, Informatica works best when you have very defined things. It's designed to be a data forklift, after all. And forklifts work best when things reside on pallets. Forklifts tend to fail if you use them to move a stack of beanbags.

But the idea intrigued me. And ironically, my coworker faced a similar issue where a client was sending apparently ever-changing XML definitions for his source. So, he needed a similar solution. A proof of concept was in order.

I created the following table:
create table test_random (
pk int not null,
v1 int,
v2 int,
v3 int,
v4 int,
v5 int,
v6 int,
v7 int,
v8 int,
v9 int,
v10 int )

This is just a generic target table with 10 placeholder value fields, and one primary key column.

And created the following input data file:
key|v3|v9|v2|
1|2|3|4|
2|2|3|4|
3|2|3|4|
4|2|3|4|
5|2|3|4|
6|0|100|42

The file header here tells me where the data goes. For the purpose of the POC, I'm only dealing with inserts, but since a primary key is supplied in my data file, I could easily do a lookup, and process updates as needed. That solution is well known, however, so I didn't bother.

Output:
select * from test_random
pk v1 v2 v3 v4 v5 v6 v7 v8 v9 v10
----------- ----------- ----------- ----------- ----------- ----------- ----------- ----------- ----------- ----------- -----------
1 NULL 4 2 NULL NULL NULL NULL NULL 3 NULL
2 NULL 4 2 NULL NULL NULL NULL NULL 3 NULL
3 NULL 4 2 NULL NULL NULL NULL NULL 3 NULL
4 NULL 4 2 NULL NULL NULL NULL NULL 3 NULL
5 NULL 4 2 NULL NULL NULL NULL NULL 3 NULL
6 NULL 42 0 NULL NULL NULL NULL NULL 100 NULL

(6 row(s) affected)

Success! But how did I do it?

First off, I created two Source qualifiers for my same source - my data file. My intention here is to grab the header with one source, and the detail records for the other source. I created two separate source definitions. For the detail source, I told it to skip the first row. Easy enough. But there is no easy way in Informatica to do the reverse - only read n number of rows. To do this, I created a sequence transformation in that pipeline, and checked the "Reset" option. This resets the sequence for every session run back to zero. The reset option isn't available on re-usable sequence generators, so don't set this as reusable.

Next, I created a filter here that eliminated any row that didn't have a "nextval" from the sequence generator of <= 1. This means that only the very first row I read passes through my filter.

Next, I JOIN the data together. What I actually want here is a full cartesian join. When you cartesion two things together, you get x*y rows as a result, where x and y are the number of rows in your two sources. So, if I had 5 rows in one thing, and 100 in another, I'd end up with 500 cartesion results. In my case, since I only have 1 header, I end up with 1*n rows of data. Only problem is, the joiner transformation doesn't support Cartesion joins - inner, left, right and full, certainly - but not cartesion. What kind of idiot would want that?!

I do- to do it, I created static values of "1" in both pipelines, and used that as my join criteria. It's simply a bogus column that joins every time. Voila - cartesion join.

Now that I have my header "v3, v9, v2", etc. associated with every detail record in my file, I can now apply the bulk of my logic.

I assign a unique port to every port in my target.

Each of these ports is an output-only port containing the following code:

IIF(rtrim(header1) = 'v1', to_integer(detail1),
IIF(rtrim(header2) = 'v1', to_integer(detail2),
IIF(rtrim(header3) = 'v1', to_integer(detail3), NULL)))

IF the value of my header matches the column name associate with the port the value is linked to, then grab the value of that ports information. Obviously, the more input columns you have, the more nesting that is required in this logic. And this block of code (with the static value being the only difference each time), gets deeper and deeper the more columns you have. But we have to evaluate every source column because the value for column 1 in my table could be in the 50th column in my source file. I must check every column until I find one that might match.

Wednesday, May 5, 2010

Adding attachments to vmailer

I use Informatica on a Windows platform, which means I've become pretty good at writing DOS Batch scripts. I'm not a huge fan of this, mind you, but I do what I have to in order to solve the problems I face.

One such problem involves sending an email for error notification from within a Batch file. You know - you've got a Batch script which SFTP's to a remote server to pick up a file from a vendor, file doesn't exist (or is zero bytes in size) and then your job happily processes nothing. Bad idea. So, you need some sort of validation so that you can know to intervene before the client angrily calls you.

One program I've found that I like a great deal is called vmailer.exe. It's a freeware, command line email utility. It's about a simple a program as you'll find (you get what you pay for), but for what I need, it works perfectly.

Except that the documentation is extremely limited, and the website's own FAQ is particularly useless.

By default, vmailer works as follows:
c:\vmailer.exe

V-Mailer v1.2.1 (www.virdi-software.com)

Usage: VMailer.exe Filename SmtpServer To From User Password

where:
FileName = text file containing the messsage
SmtpServer = server name or ip address e.g. smtp.domain.com
To = recipient's address e.g. him@domain.com
From = sender's address e.g. me@domain.com
User = (optional) user name if authentication required
Password = (optional) password if authentication required

Most of this is pretty self-explanatory. The user/password at the end I've never used. So, that leaves just 4 options. The second one might involve an email to your sysAdmin to ask what the exchange server is. If you know where to find this, you can dig through Outlook and get this information as well. The To and From are the email addresses, and I'm pretty sure you can put whatever you want in the From line.

That leaves the "filename". In its simplest form you could put something like this in your filename:
"Hello world"

That would show up in the body of your message. But your Subject line would be blank. The vmailer FAQ is helpful in solving this problem, however, and suggests you need to do some formatting of your filename in order to get a subject line in there. This suggests that there is some intelligence in this file, and it's not merely the contents of your email body.

That gives us something a bit more complex:
From: Curtis
To: Valued Recipients
Subject: This is a Subject line.

This is a test.

That's better! Now we get a subject line. And a bit more power over the "To" and "From" which you can play with if such things amuse you. (they do me)

But most recently, I wanted to be able to add an attachment to the file. This proved harder, and the web proved particularly unkind to me in solving this dilemna. I did find a good site which finally revealed a simple solution, but I had to dig and dig for it, which is why I decided to put it here for my own sake, if not for the sake of others trying to find similar information.

MIME-Version: 1.0
Content-Type: text/plain ; name="test_20100505.csv"
Content-Transfer-Encoding: Base64
Content-Disposition: attachment; filename="test_20100505.csv"
From: Curtis
To: Valued Recipients
Subject: This is an Attachment Test

This is a test.

Now, I'm not going to sit here and pretend that I know what all of that stuff means exactly, but I do know that there are multiple content-types, and if you google "MIME content-types" you can get a list of all the types out there that a MIME email accepts.

Obviously, this is a bit outside the scope of Vmailer, since at this point we're talking about MIME, and no longer about vmailer. But if you're like me, you enjoy simple solutions. :)

Thursday, March 18, 2010

Badfiles

Whenever rows get rejected from an insert/update/whatever, they get written to the badfiles. This is kind of like "time out" for naughty records. Unlike naughty children, however, you can choose to completely ignore these records for all eternity. That might not be what you want to do, however.

Assuming you care what's in that file, you'll need to read it. Thing is, it's kind of a convoluted mess upon close examination. Informatica creates a dump of your record plus a bunch of other stuff into a comma-delimited flat file. You can read this file like any other source record, except it's not the same as your failed session's target file definition.

In fact, it looks a little something like this:
0,D,26,D,,N,138,D,,N,,N,,N,,N,7,D,,N,,N,1,D....

The first column is a row_indicator (smallint). It contains values from 0 to 9. The most common being 0 or 1 which means the row was either inserted or updated and failed.

Then, there is a column indicator. The first one of this is always a "D", which means it's valid data.

Then you get every column that was written to your previous target that failed, followed by a column indicator after every column.

So, if you tried to write to a table with 3 columns A, B and C, and failed on a unique constraint violation, your badfiles table will look like this:

0,D,A,D,B,D,C,D

Where all the "D" values are column indicators indicating that the columns were valid data, and the 0 indicates that it was an insert. In the event that a column was null, you'll two commas (,,) followed by an "N" for Null.

For more information on badfiles, consult Informatica's excellent help. This topic is listed under "reject files" - you'll have a hard time searching for it under "badfiles".

Thursday, February 18, 2010

Convert UNIX epoch time in Informatica

Handy function for converting UNIX epoch time in Informatica. UNIX epoch time is a int(11) field which contains the number of elapsed seconds since midnight 1/1/1970.

To convert to a GMT date in Informatica, I created a mapplet. The mapplet receives the EPOCH_TIME as a variable.

IIF(EPOCH_TIME > 0,
ADD_TO_DATE(to_date('19700101', 'yyyymmdd'),'SS',EPOCH_TIME), NULL)

The original version I had created of this lacked the IIF test. So, in cases where epoch time was 0, I was getting 1/1/1970 as my result. Which I didn't want.

Enjoy.

Great IT Disasters