This is kind of a big topic, but I felt like it was important to talk about.
For the purposes of this discussion, I'm going to use the word "folder" to describe a logical placeholder within Informatica itself, and "directory" to describe to a path on a hard drive within an operating system.
In other words, mappings are stored in folders, and files are stored in directories. Got it? Good.
When you create a new folder using the Repository Manager in Informatica, it just creates the folder within Informatica - no actual directories are stored. Which means if you're writing a logfile somewhere, it's just going to get written to some default location. That probably varies based on installation, but in the environment I use, it's:
E:\Informatica\Powercenter8.6.1\server\infa_shared
Now, this isn't terribly useful, and it's quite cluttered. Especially if you have hundreds of sessions writing logs into the same folder. Not pretty. But that might not be horrible if you just had logs - but Informatica stores all kinds of files - input files, bad files, target files, you name it.
Fortunately, within the above path, Informatica creates a default cache directory (in the above named directory .\cache, for example). This is handy, but still not ideal.
A better strategy is to create a specific directory for every folder within Informatica. This keeps things organized. But you have to go one step further, and you actually have to modify your workflow and session properties to write to these specific directories. Not that big of a deal. You just have to be diligent with it.
We use the following sub-directories within the main root directory of a folder. So, if we have a folder called "Sales", we'll create a .\Informatica\Sales\ directory (where you put this is up to you), and then sub-directories below that. We use:
Backup\
BadFiles\
Cache\
Docs\
Export\
ExtProc\
Headers\
Parameter\
Scripts\
SessLogs\
Sql\
SrcFiles\
Temp\
TgtFiles\
WorkflowLogs\
Most of these were defined before I started here, so I'm not sure of the reasoning behind a few of them. I'm sure the previous admin's intentions were noble, but in reality, we only really use a few of these. Of these, Cache\, SessLogs\ and WorkflowLogs\ are directories every workflow is going to need. Unless you're writing out a reject file, you don't need BadFiles\. Backup\ would be a handy place to store data files, but I'm in favor of keeping data files in a completely separate location - which also means the value of SrcFiles\ and TgtFiles\ are called into question. Honestly, I only ever use those when I'm testing. So, they do come in kind of handy. Docs\ is questionable. I can't imagine anyone ever looking for documentation here. Storing it online is a far better option. If you ever export your mappings (perhaps as a backup?) Export\ would be a good choice. Of course, this makes Backup\ completely redundant. I have no idea what ExtProc\ was supposed to be for. Headers\ is useful if you're writing to files that need headers. That's where I store my headers. Parameter\ is also handy if you're workflow is using a parameter file. Scripts\ and Sql\ are both questionable. First off, I don't know an Informatica mapping would need a sql file, and any kornshell or batch scripts you have that might be called by the workflow, are probably better off being stored somewhere else. Lastly, Temp\ is also kind of useless. Informatica already has a Cache\ directory, so Temp\ is redundant.
That gives us the following, useful relevant directories:
Badfiles\ (optional, perhaps)
Cache\
Export\ (optional)
Headers\ (optional)
Parameter\ (optional)
SessLogs\
WorkflowLogs\
If you insist on having a SrcFiles\ and TgtFiles\ folder, I'd recommend a single location shared by all workflows. That way, when you're drive fills up because someone decided to test a 14 gb input file, you can find them easily, without having to scan through a bunch of subdirectories looking for candidate files to remove. This also makes it a little easier for workflows to share some common data pool. Like, if two workflows read from the same input, but exist in different folders, for example.
Lastly, I would be remiss if I didn't state the obvious, and that is the drive or mount hosting all these directories should be large - large enough for all those Cache\ directories. Alternatively, you could create a special mount or drive to just hold Cache\ files. If that place is too small, and fills up, your mapping will die while creating lookup files. That's bad.
Also, session logs can become VERY large if they produce a lot of errors, or if verbose logging is turned on. So, be sure to clean those up, if they're not needed.
Wednesday, April 29, 2009
Subscribe to:
Post Comments (Atom)
Many of the directories as you probably already know are part of the default server install. They are configurable to put on seperate locations and filesystems dependent on your hardware, network, storage etc. The ExtProc is for binary code from external procedure transformation and is a carry over from previous versions. After version 7.x they introduced custom transformations like the Union Tx. The docs folder also appears to be created with the default install.
ReplyDeleteThanks for the info, Sanjay. That's good to know.
ReplyDeleteHey Sanjay, I'm currently trying to customise my location of Informatica's Cache (we need more space for it), where can this be done?
ReplyDeleteThanks guys, good article