When a Business Key is not available

have you found yourself in a situation where a natural world or business key is not available?  we all have.  what do you do in this case? how do you construct a hub?  is there a way in which to still build the data vault?  all of these are valid questions, and over the next few months i’ll try to answer them.  let’s see though if we can’t offer a few suggestions in this posting.

as we teach in the certification class (http://datavaultalliance.com) you will find that if you don’t have a business key to construct  a proper hub, that you need to rely on several other techniques.  these techniques are listed in a progessive order below:

  1. look for multiple fields that construct as much of a unique key as possible – then try to determine if these fields are actually descriptive in nature.  if they are character based fields, they have a higher chance of being “fat-fingered” or entered wrong in the application than if they are numeric or date / time fields.
  2. still can’t find a key?  look for a source system numeric identifier – yes, unfortunately a source system sequence number.  but try to ensure that the application programmers have “exposed” this un-natural key to the business, if they have (either on a report or on an application screen) then their is a good chance that business users are actually using this field as a business key.  if these source systems have not exposed these numbers to the business (they shouldn’t) , then there is a risk that these sequences may be “re-assigned” during full restore of the source database.  this would wreak havoc on your data warehouse (data vault or not) unless you have cdc (change data capture) on the source system.
  3. still can’t find even a numeric key?  then you are out of luck…  my question to you then & to the business is: how do you identify unique rows?  what’s the business process for selecting a single row out of multiples for editing purposes?  identify the business process, and try to establish a mechanical method for choosing unique rows.  by mechanical i mean: something the machine recognizes, that doesn’t need human intervention/human decision making.

ok – so this is a brief list of things you can try.  the risk of using numerical sequences from the source is potentially huge.  especially when cross-integrating source data into the same hub..  however, this is somewhat mitigated (same key means different things) by separating satellites by source system feed.

at the end of the day, there are more ideas around business keys, and surrogate keys that are taught in the class.  ideas i don’t have time to re-iterate here now, but maybe in the future.

this can cause huge confusion when you go to get data out of the data vault.  all i can say here is: document this process well.  ensure you’ve documented the gap between “what the business believes is a unique business key” and “what the source systems tell you is the real data”

hope this helps, have thoughts?  wish to contribute a solution?  love to hear from you.

dan l

Tags: , , ,

2 Responses to “When a Business Key is not available”

  1. Peter Larson 2015/07/03 at 9:20 am #

    Hi Dan, I am a recent convert to DV. I am trying to apply the model to a situation I have.

    Is there anywhere I can find an example of a Key Satellite? I have seen it referred to a few times, but no concrete explanations as to design. Would it be one row per hub item, and each potential key as a separate column? Or one row per potential key?

    I am facing a warehousing situation where several entities have no natural business key but where surrogate keys are used by the business. There are more than one system managing these entities and we have not yet adopted a full MDM where I could say one system could act as the master key. In fact my whole dataset is full of such situations because the business is very fragmented. I have upwards of 40 different source systems to contend with.

    I am torn between putting everything into a single hub and using a Same-As-Link to resolve versus a Key Satellite. Do you have any thoughts about when it is preferred to use which?

  2. Dan Linstedt 2015/07/03 at 12:08 pm #

    Hi Peter,

    I don’t know whom is discussing “Key Satellites”, or in what context. I don’t recall ever posting anything about the subject, because in my opinion, no such thing exists… If I did post something, it may have been specific to a special case. Anyhow, in order to define something like that, I’d need to better understand your specific business case.

    In reading the rest of your question, I really do need to look at your business case. Unfortunately there is not enough information here for me to make a decision. I would ask questions like: what grain are these surrogates at in their source? do they represent the same source data in multiple systems? are they supposed to represent the same data in multiple systems? what is the driving business need to integrate these keys in the first place? What is the business case?

    Let me know if we can schedule some consulting time over the web, and we can try to dive in to this to get an answer. I think maybe 2 hours of time should solve it.

    Dan Linstedt

Leave a Reply