the data vault model is only one piece to the data warehousing puzzle. it essentially is a hub-and-spoke model, made up of hubs (centrallized locations for business keys), links (associations across or between business keys), and satellites (the data warehousing component that indicates descriptive data over time).
often times we forget that a hub and spoke model appears to be infinately scalable. because of this scalability we can re-arrange the hubs and spokes in many different ways, for different time frames. it’s as if we are evolving the hub-and-spoke model over time (which is exactly what happens when business changes and the model changes). this seemingly innocuous form for data storage is highly scalable – it can be partitioned many different ways. for instance, it can be split across geographic regions, split by security access, split by grain of data (just to name a few).
these types of uses for hub and spoke models make the data vault an extremely powerful design. please keep this in mind: i didn’t create the design, i merely adapted it for data warehousing – i applied formalized rule sets and standards that make it possible to load, track, query and use in the form of a data warehouse.
there are a few folks out there who like to break rules, break standards, and try to improve upon the simplicity of this model – but this again, isn’t my model. if you look out your window and see a tree – a simple looking tree, you will see in essence a hub and spoke model. one central trunk, usually splits in to two major trunks, which split off into major branches, followed by minor branches, followed by leaves, etc… these types of designs are all over nature. so why shouldn’t we use a design that came from nature?
they say (the scientists who study the brain) that the brain neurons, synapses, and dendrites are formatted “this way” in 3 dimensions. great! put the data vault model in 3d and play with it. you get a very powerful result.
for the hubs and links i borrowed concepts from 3rd and 4th normal form data modeling. for the satellites i borrowed concepts from dimensional modeling and inverted the key structure. why then are some people still not convinced that this is “old-hat”? this is not new stuff folks, it’s just re-arranged a bit from what you are used to.
again, the magic happens in the rules of loading and querying. the magic continues with scalability and auditability at the central core. remember: interpretion of memories in the human brain chalk up to experiences. every person who experiences the same event, remembers or interprets it differently. therefore, in the data warehousing/bi world, we should store the data as it stands on the source system, except integrated by relationships – and interpret it on the way out to the data marts. this is absolutely critical to remember, i will cover more of this in a future post.
how do you define the data vault model? share with us.