frequently asked questions
for data vault modeling, methodology, architecture and implementation. i will do my best to answer questions here as they come up. feel free to post new questions, please read the proper category descriptions to keep them in the right place.
have a question? submit one here.
Data Vault Implementation
i recently saw a post about sql server deprecating older hashing algorithms requiring the newer sha2 versions to be used which would increase the hash. i tried to reply to that but could not for some reason so i’ll just pitch the question here. we currently get around this by converting the value from hashbytes to a bigint. that has the plus of introducing integer based joins versus character based hash joins as well as providing good partition distribution, but we have always wondered if it increases the risk of collisions. we’ve tested this with all the algorithms and have yet to come across a collision…keeping our fingers crossed.. i actually wondered why this was not mentioned in the book as an alternative. is it because it could increase the chance of collisions or some other consideration dan?
hashbytes and md5 deprecated in sqlserver 2016
i found this information in sql server 2016 documentation.
beginning with sql server 2016, all algorithms other than sha2_256, and sha2_512 are deprecated. older algorithms (not recommended) will continue working, but they will raise a deprecation event.
would using this have a performance impact? would you recommend using either of the supported algorithms as a dv standard?