i’ve blogged on this in many places, at many times. in this very short entry, i will show you what happens if you try to use hash functions as business keys. i will also show you exactly why you shouldn’t rely on them as delta processors either! of course, not all hashes are created equal…
the hashes i am specifically referring to that cause a ton of trouble are crc16 and crc32 (cyclical redundancy checksum, 16 bit and 32 bit). these hash functions fall in to a category known as position independent hashes.
a position independent hash function will produce the exact same hash output for the exact same bits, even if they are in a different order!
for example: take the following strings, and see the output:
“this is a test”
“test is a this”
“a is test this”
“this test is a”
“this isa test”
they all produce exaclty the same hash number:
crc16 & crc32 = 1601 (decimal)
try it yourself: http://webnet77.com/cgi-bin/helpers/crc.pl
it doesn’t matter what order the bits are in, if you have all the same characters, and exactly the same length, the hash matches!
moral of the story: if you’re going to use this type of function as a business key or a primary key in your data warehouse, then i would strongly suggest you try to locate a hash function that is position dependent.
hope this helps,
ps: you can find more helpful tips and techniques inside my coaching area.