I have this ongoing feud with SQL Server - no, not Microsoft, and not any specific purpose, the database itself.
See, in general, I like the idea behind the varchar() data type, but there's something wholly unclean about it that bugs me. I think it's the variability in it that I don't care for. Whatever the reason, a while back I set out to make some "adjustments" to the database. I'd read that varchar() columns in SQL Server consumed n+2 bytes of space, where n was the length of the specific data element being stored and 2 bytes were needed for varchar() overhead. Enter the brain surgeons that were the previous developers to the system I support. For some hair-brained reason, we actually had varchar(1) columns in our database. Proposterous! For all intents and purposes, a varchar(1) column should take up a minimum of 2 bytes (for null values) and 3 bytes for non-null values. What a ridiculous waste of space!! And that's just the data modeler - imagine how much database space that consumes. :)
We also had varchar(10) for a field that was ALWAYS 4 bytes long - and varchar(70) for fields that never, ever, ever exceeded 20 bytes. My sense of perfection and harmony within the universe was being violated. I needed to do something about this.
So, quietly, I set about fixing this gross injustice. First, I changed the varchar(1) fields to char(1) fields. Next, I set my sights on those that needed to be char(4) - but then I got clever - "I know, I'll check the database properties for the table. That way, I can see how much space I'm saving!" - so I checked the size of the table before and after one of the field conversions - BIGGER!! What the heck!? "How can that be?!", I pondered, shaking my fist at the screen. "Everything I know is wrong!"
This is where the feud comes in. SQL Server is currently laughing at me, but I'm not done yet. I'll figure this mystery out eventually.
Meanwhile, Informatica decided to wreak havoc on my life. How? Well, apparently, there's a major difference in the way Informatica treats, say, a varchar(10) and a char(3). One of the fields I changed was a transaction type, and we only have two - and the biggest one is 3 bytes long, the other is 2. So, I changed the field to be char(3). Why not? Every field is populated (it's not null), so at worst I'm losing 1 byte.
Except that when Informatica pulls this data back from a lookup, when it was a varchar(10) before, it was handing me "XX" on a silver platter. Now, because it is char(3), I'm getting back "XX ". Only, this time instead of on a silver platter, it comes back with a side of "here's an extra space you didn't ask for, nor do you want". Kind of like putting nuts in moist, chewy brownies. Why do people do that?!
So, take this to heart - if you decide to change a field from varchar() to char(), be aware that Informatica is going to handle that completely differently - in ways that you may not like.
Tuesday, May 5, 2009
Subscribe to:
Post Comments (Atom)
Too funny, love the varchar(1), thats great. I hear you on Informatica adding spaces; thats burned me a few times now. Seems like I'm always trimming just to be safe. Thanks, Jody Pilsworth
ReplyDelete