Can you compare and contrast a chicken with a duck?

unfortunately the world today wants to mistakenly compare and contrast chickens and ducks.  well i am in the process of writing a full compare and contrast white paper that i will soon release that i hope goes through all the right points.  this post will cover just a few points – if you are a) going to ask me to compare & contrast, b) ask me to justify the data vault over star schema/dimensional bus, etc.. c) going to challenge my beliefs – often without justifying your own, and without offering definitions or explanations of how you do things “your way.”

anyhow, chickens with ducks.  not a very fair comparison if you ask me.

mistake #1: comparing the kimball star schema with the data vault methodology

proper comparison: kimball etl methodology and project plan components with the data vault implementation methodology.

mistake #2: comparing the kimball bus architecture with the data vault model

proper comparison: kimball bus architecture with the data vault architecture

mistake #3: comparing the kimball staging area with the data vault model

proper comparison: there really isn’t any, but if you have to do it, then you *must* change your definition of “data warehouse” to be non-volatile and raw (or granular as dr. kimball defines it).  then and only then, you can compare dimensional modeling with data vault modeling.

mistake #4: comparing the kimball bus architecture with the data vault methodology

proper comparison: kimball bus architecture with the data vault systems architecture

ok, now that we’ve cleared that up a bit, the next thing that needs to happen are definitions.  in order to do any sort of proper comparisons at all, we have to first agree on definitions of the following:

  1. staging area
  2. data warehouse
  3. data mart
  4. systems architecture
  5. methodology
  6. data model design
  7. referential integrity
  8. parallelism (only because it keeps coming up – it really doesn’t belong here)
  9. real-time (again, because it keeps coming up – it too doesn’t belong here)
  10. big data – what does it mean to have “big data”?
  11. unstructured data and semi structured data
  12. business rules (how are they defined?)

basis for the upcoming article:

the upcoming article will have several definitions along the above setup.  however my book: super charge your data warehouse also contains many of these definitions, so for brevity – the definitions included in the article will be shortened.

the article will contain the following major categories:

  • kimball 2 tier systems architecture versus data vault 3 tier systems architecture
  • kimball 3 tier systems architecture (proposed only) versus data vault 3 tier systems architecture
  • kimball bus etl methodology & implementation versus data vault etl and implementation
  • kimball definitions of each of the components versus bill inmons and data vault’s definitions of the components
  • dimensional data warehouse versus data vault model data warehouse
  • 3rd normal form data warehouse versus data vault model data warehouse

note: for the modeling side, it’s not a comparison, rather it is listed as pros and cons of using each technique for the data warehouse portion of the architecture.

how you can help.

i want this to be a community effort.  if you have anything to contribute to this white paper, please do so – as quickly as possible, here as a comment on the forum.  or email it to me privately.  i will take your statements for either side of the discussion.

remember; the purpose of this article is not to tell people what they should choose, it is merely to compare and contrast the two.

thanks,

dan linstedt

Tags: , , , , , , , , ,

No comments yet.

Leave a Reply

*