Can you compare and contrast a chicken with a duck?

Unfortunately the world today wants to mistakenly compare and contrast chickens and ducks.  Well I am in the process of writing a full compare and contrast white paper that I will soon release that I hope goes through all the right points.  This post will cover just a few points – if you are a) going to ask me to compare & contrast, b) ask me to justify the Data Vault over Star Schema/Dimensional Bus, etc.. c) going to challenge my beliefs – often without justifying your own, and without offering definitions or explanations of how you do things “your way.”

Anyhow, Chickens with Ducks.  Not a very fair comparison if you ask me.

Mistake #1: Comparing the Kimball Star Schema with the Data Vault Methodology

Proper comparison: Kimball ETL methodology and project plan components with the Data Vault Implementation Methodology.

Mistake #2: Comparing the Kimball Bus Architecture with the Data Vault Model

Proper Comparison: Kimball Bus Architecture with the Data Vault Architecture

Mistake #3: Comparing the Kimball Staging Area with the Data Vault Model

Proper Comparison: There really isn’t any, but if you have to do it, then you *must* change your definition of “data warehouse” to be non-volatile and RAW (or granular as Dr. Kimball defines it).  THEN and only then, you can compare dimensional modeling with Data Vault Modeling.

Mistake #4: Comparing the Kimball Bus Architecture with the Data Vault Methodology

Proper Comparison: Kimball Bus Architecture with the Data Vault Systems Architecture

Ok, now that we’ve cleared that up a bit, the next thing that needs to happen are definitions.  In order to do any sort of proper comparisons at all, we have to first agree on definitions of the following:

  1. Staging Area
  2. Data Warehouse
  3. Data Mart
  4. Systems Architecture
  5. Methodology
  6. Data Model Design
  7. Referential Integrity
  8. Parallelism (only because it keeps coming up – it really doesn’t belong here)
  9. Real-Time (again, because it keeps coming up – it too doesn’t belong here)
  10. Big Data – what does it mean to have “big data”?
  11. Unstructured Data and Semi Structured Data
  12. Business Rules (how are they defined?)

Basis for the upcoming article:

The upcoming article will have several definitions along the above setup.  However my book: Super Charge Your Data Warehouse also contains many of these definitions, so for brevity – the definitions included in the article will be shortened.

The article will contain the following major categories:

  • Kimball 2 Tier Systems Architecture versus Data Vault 3 Tier Systems Architecture
  • Kimball 3 Tier Systems Architecture (proposed only) versus Data Vault 3 Tier Systems Architecture
  • Kimball Bus ETL Methodology & Implementation versus Data Vault ETL and Implementation
  • Kimball Definitions of each of the components versus Bill Inmons and Data Vault’s definitions of the components
  • Dimensional Data Warehouse versus Data Vault Model Data Warehouse
  • 3rd Normal Form Data Warehouse versus Data Vault Model Data Warehouse

Note: for the modeling side, it’s not a comparison, rather it is listed as pros and cons of using each technique for the data warehouse portion of the architecture.

How you can help.

I want this to be a community effort.  If you have anything to contribute to this white paper, please do so – as quickly as possible, here as a comment on the forum.  OR email it to me privately.  I will take your statements for either side of the discussion.

Remember; the purpose of this article is not to tell people what they should choose, it is merely to compare and contrast the two.

Thanks,

Dan Linstedt

Tags: , , , , , , , , ,

No comments yet.

Leave a Reply

*