Data Vault Modeling

the data vault model is only one piece to the data warehousing puzzle.  it essentially is a hub-and-spoke model, made up of hubs (centrallized locations for business keys), links (associations across or between business keys), and satellites (the data warehousing component that indicates descriptive data over time).

complex data vault

complex data vault

often times we forget that a hub and spoke model appears to be infinately scalable.  because of this scalability we can re-arrange the hubs and spokes in many different ways, for different time frames.  it’s as if we are evolving the hub-and-spoke model over time (which is exactly what happens when business changes and the model changes).  this seemingly innocuous form for data storage is highly scalable – it can be partitioned many different ways.  for instance, it can be split across geographic regions, split by security access, split by grain of data (just to name a few).

these types of uses for hub and spoke models make the data vault an extremely powerful design.  please keep this in mind: i didn’t create the design, i merely adapted it for data warehousing – i applied formalized rule sets and standards that make it possible to load, track, query and use in the form of a data warehouse.

there are a few folks out there who like to break rules, break standards, and try to improve upon the simplicity of this model – but this again, isn’t my model.  if you look out your window and see a tree – a simple looking tree, you will see in essence a hub and spoke model.  one central trunk, usually splits in to two major trunks, which split off into major branches, followed by minor branches, followed by leaves, etc…  these types of designs are all over nature.  so why shouldn’t we use a design that came from nature?

they say (the scientists who study the brain) that the brain neurons, synapses, and dendrites are formatted “this way” in 3 dimensions.  great!  put the data vault model in 3d and play with it.  you get a very powerful result.

simpler hub and spoke model

simpler hub and spoke model

for the hubs and links i borrowed concepts from 3rd and 4th normal form data modeling.  for the satellites i borrowed concepts from dimensional modeling and inverted the key structure.   why then are some people still not convinced that this is “old-hat”?  this is not new stuff folks, it’s just re-arranged a bit from what you are used to.

again, the magic happens in the rules of loading and querying.  the magic continues with scalability and auditability at the central core.   remember: interpretion of memories in the human brain chalk up to experiences.  every person who experiences the same event, remembers or interprets it differently.  therefore, in the data warehousing/bi world, we should store the data as it stands on the source system, except integrated by relationships – and interpret it on the way out to the data marts.  this is absolutely critical to remember, i will cover more of this in a future post.

how do you define the data vault model?  share with us.
dan l

Tags: , ,

3 Responses to “Data Vault Modeling”

  1. Ron Phillips 2015/01/24 at 8:05 am #

    Hi Dan, I’m wondering what experience you have applying this sort of nature model to corporate organizatoin/governance. Thanks, Ron

  2. Dan Linstedt 2015/02/09 at 6:16 am #

    Hi Ron,

    Hi Ron, I am not quite sure what you are asking. If you are referring to using this methodology / and all of it’s components to ensure corporate governance of an EDW program or project, then yes. I’ve done this many times.

    If you are referring to corporate governance of the data set IN the EDW, then yes, I have done this many times as well. In fact, if one follows the standards in the DV Modeling paradigm, then it becomes necessary to institute good governance over the data embedded. Otherwise, it is near impossible to reach the level of auditability desired by the corporation.

    Furthermore, good governance begins to enable steps in the direction of Master Data Management. But this is a discussion for another time.

    Thank-you for your question, feel free to contact me directly if you have any further questions.
    Dan Linstedt

  3. Patrick Borosch 2015/03/13 at 11:22 am #

    Hi Dan,

    how would you model a ‘tenant’-construction into a data vault?

    For example you want to build an enterprise model and you want to reuse it for different subsidiaries. They all have similar data, that has to be differentiated esp. on the reporting/analysis frontend.
    Would you do this via source-ids or would you create a Hub-Sat-construct for the tenants and link them to every other hub in the model?


Leave a Reply