unfortunately the world today wants to mistakenly compare and contrast chickens and ducks. well i am in the process of writing a full compare and contrast white paper that i will soon release that i hope goes through all the right points. this post will cover just a few points – if you are a) going to ask me to compare & contrast, b) ask me to justify the data vault over star schema/dimensional bus, etc.. c) going to challenge my beliefs – often without justifying your own, and without offering definitions or explanations of how you do things “your way.”
anyhow, chickens with ducks. not a very fair comparison if you ask me.
mistake #1: comparing the kimball star schema with the data vault methodology
proper comparison: kimball etl methodology and project plan components with the data vault implementation methodology.
mistake #2: comparing the kimball bus architecture with the data vault model
proper comparison: kimball bus architecture with the data vault architecture
mistake #3: comparing the kimball staging area with the data vault model
proper comparison: there really isn’t any, but if you have to do it, then you *must* change your definition of “data warehouse” to be non-volatile and raw (or granular as dr. kimball defines it). then and only then, you can compare dimensional modeling with data vault modeling.
mistake #4: comparing the kimball bus architecture with the data vault methodology
proper comparison: kimball bus architecture with the data vault systems architecture
ok, now that we’ve cleared that up a bit, the next thing that needs to happen are definitions. in order to do any sort of proper comparisons at all, we have to first agree on definitions of the following:
- staging area
- data warehouse
- data mart
- systems architecture
- data model design
- referential integrity
- parallelism (only because it keeps coming up – it really doesn’t belong here)
- real-time (again, because it keeps coming up – it too doesn’t belong here)
- big data – what does it mean to have “big data”?
- unstructured data and semi structured data
- business rules (how are they defined?)
basis for the upcoming article:
the upcoming article will have several definitions along the above setup. however my book: super charge your data warehouse also contains many of these definitions, so for brevity – the definitions included in the article will be shortened.
the article will contain the following major categories:
- kimball 2 tier systems architecture versus data vault 3 tier systems architecture
- kimball 3 tier systems architecture (proposed only) versus data vault 3 tier systems architecture
- kimball bus etl methodology & implementation versus data vault etl and implementation
- kimball definitions of each of the components versus bill inmons and data vault’s definitions of the components
- dimensional data warehouse versus data vault model data warehouse
- 3rd normal form data warehouse versus data vault model data warehouse
note: for the modeling side, it’s not a comparison, rather it is listed as pros and cons of using each technique for the data warehouse portion of the architecture.
how you can help.
i want this to be a community effort. if you have anything to contribute to this white paper, please do so – as quickly as possible, here as a comment on the forum. or email it to me privately. i will take your statements for either side of the discussion.
remember; the purpose of this article is not to tell people what they should choose, it is merely to compare and contrast the two.