Unfortunately the world today wants to mistakenly compare and contrast chickens and ducks. Well I am in the process of writing a full compare and contrast white paper that I will soon release that I hope goes through all the right points. This post will cover just a few points – if you are a) going to ask me to compare & contrast, b) ask me to justify the Data Vault over Star Schema/Dimensional Bus, etc.. c) going to challenge my beliefs – often without justifying your own, and without offering definitions or explanations of how you do things “your way.”
Anyhow, Chickens with Ducks. Not a very fair comparison if you ask me.
Mistake #1: Comparing the Kimball Star Schema with the Data Vault Methodology
Proper comparison: Kimball ETL methodology and project plan components with the Data Vault Implementation Methodology.
Mistake #2: Comparing the Kimball Bus Architecture with the Data Vault Model
Proper Comparison: Kimball Bus Architecture with the Data Vault Architecture
Mistake #3: Comparing the Kimball Staging Area with the Data Vault Model
Proper Comparison: There really isn’t any, but if you have to do it, then you *must* change your definition of “data warehouse” to be non-volatile and RAW (or granular as Dr. Kimball defines it). THEN and only then, you can compare dimensional modeling with Data Vault Modeling.
Mistake #4: Comparing the Kimball Bus Architecture with the Data Vault Methodology
Proper Comparison: Kimball Bus Architecture with the Data Vault Systems Architecture
Ok, now that we’ve cleared that up a bit, the next thing that needs to happen are definitions. In order to do any sort of proper comparisons at all, we have to first agree on definitions of the following:
- Staging Area
- Data Warehouse
- Data Mart
- Systems Architecture
- Data Model Design
- Referential Integrity
- Parallelism (only because it keeps coming up – it really doesn’t belong here)
- Real-Time (again, because it keeps coming up – it too doesn’t belong here)
- Big Data – what does it mean to have “big data”?
- Unstructured Data and Semi Structured Data
- Business Rules (how are they defined?)
Basis for the upcoming article:
The upcoming article will have several definitions along the above setup. However my book: Super Charge Your Data Warehouse also contains many of these definitions, so for brevity – the definitions included in the article will be shortened.
The article will contain the following major categories:
- Kimball 2 Tier Systems Architecture versus Data Vault 3 Tier Systems Architecture
- Kimball 3 Tier Systems Architecture (proposed only) versus Data Vault 3 Tier Systems Architecture
- Kimball Bus ETL Methodology & Implementation versus Data Vault ETL and Implementation
- Kimball Definitions of each of the components versus Bill Inmons and Data Vault’s definitions of the components
- Dimensional Data Warehouse versus Data Vault Model Data Warehouse
- 3rd Normal Form Data Warehouse versus Data Vault Model Data Warehouse
Note: for the modeling side, it’s not a comparison, rather it is listed as pros and cons of using each technique for the data warehouse portion of the architecture.
How you can help.
I want this to be a community effort. If you have anything to contribute to this white paper, please do so – as quickly as possible, here as a comment on the forum. OR email it to me privately. I will take your statements for either side of the discussion.
Remember; the purpose of this article is not to tell people what they should choose, it is merely to compare and contrast the two.