Why Hashes CANNOT be business keys

I’ve blogged on this in many places, at many times.  In this very short entry, I will show you what happens if you try to use HASH functions as business keys.  I will also show you exactly why you shouldn’t rely on them as DELTA processors either!  Of course, not all hashes are created equal…

The hashes I am specifically referring to that cause a TON of trouble are CRC16 and CRC32 (cyclical redundancy checksum, 16 bit and 32 bit).  These hash functions fall in to a category known as POSITION INDEPENDENT HASHES.

A position independent hash function will produce the EXACT SAME HASH OUTPUT for the exact same bits, EVEN if they are in a different order!

For example: take the following strings, and see the output: 

“this is a test”
“test is a this”
“a is test this”
“this test is a”
“this isa  test”
“testisathis   ”

They ALL produce EXACLTY the same hash number: 

CRC16 & CRC32 = 1601 (decimal)

Try it yourself: http://webnet77.com/cgi-bin/helpers/crc.pl

It doesn’t matter what order the bits are in, if you have ALL the same characters, and exactly the same length, the hash matches!

Moral of the story: if you’re going to use this type of function as a business key or a primary key in your data warehouse, then I would strongly suggest you try to locate a hash function that IS position dependent.

Hope this helps,
Dan Linstedt
PS: you can find more helpful tips and techniques inside my coaching area.

Tags: , ,

No comments yet.

Leave a Reply