I’ve blogged on this in many places, at many times. In this very short entry, I will show you what happens if you try to use HASH functions as business keys. I will also show you exactly why you shouldn’t rely on them as DELTA processors either! Of course, not all hashes are created equal…
The hashes I am specifically referring to that cause a TON of trouble are CRC16 and CRC32 (cyclical redundancy checksum, 16 bit and 32 bit). These hash functions fall in to a category known as POSITION INDEPENDENT HASHES.
A position independent hash function will produce the EXACT SAME HASH OUTPUT for the exact same bits, EVEN if they are in a different order!
For example: take the following strings, and see the output:
“this is a test”
“test is a this”
“a is test this”
“this test is a”
“this isa test”
They ALL produce EXACLTY the same hash number:
CRC16 & CRC32 = 1601 (decimal)
Try it yourself: http://webnet77.com/cgi-bin/helpers/crc.pl
It doesn’t matter what order the bits are in, if you have ALL the same characters, and exactly the same length, the hash matches!
Moral of the story: if you’re going to use this type of function as a business key or a primary key in your data warehouse, then I would strongly suggest you try to locate a hash function that IS position dependent.
Hope this helps,
PS: you can find more helpful tips and techniques inside my coaching area.