Why Hashes CANNOT be business keys

i’ve blogged on this in many places, at many times.  in this very short entry, i will show you what happens if you try to use hash functions as business keys.  i will also show you exactly why you shouldn’t rely on them as delta processors either!  of course, not all hashes are created equal…

the hashes i am specifically referring to that cause a ton of trouble are crc16 and crc32 (cyclical redundancy checksum, 16 bit and 32 bit).  these hash functions fall in to a category known as position independent hashes.

a position independent hash function will produce the exact same hash output for the exact same bits, even if they are in a different order!

for example: take the following strings, and see the output: 

“this is a test”
“test is a this”
“a is test this”
“this test is a”
“this isa  test”
“testisathis   ”

they all produce exaclty the same hash number: 

crc16 & crc32 = 1601 (decimal)

try it yourself: http://webnet77.com/cgi-bin/helpers/crc.pl

it doesn’t matter what order the bits are in, if you have all the same characters, and exactly the same length, the hash matches!

moral of the story: if you’re going to use this type of function as a business key or a primary key in your data warehouse, then i would strongly suggest you try to locate a hash function that is position dependent.

hope this helps,
dan linstedt
ps: you can find more helpful tips and techniques inside my coaching area.

Tags: , ,

No comments yet.

Leave a Reply