Mis-information about Data Vault

apparently there is another book in the market about data vault modeling.  while it has some interesting notions, it also has some false claims about the nature of data vault modeling which need to be corrected.  any time there is a disagreement between what’s available in the market (standards based) and what i produce as the standards, i would have to say that my standards are correct.  particularly because i am the author and inventor of the data vault, and i have had years and years (since 1990) of r&d and implementation experience.

false claim #1:  multi-active satellites are not ok

reality: multi-active satellites are ok, and are in fact a necessary part of surviving in data warehousing / data vaults.  why? because the data set like phone numbers (which are described in my book super charge your data warehouse) are a part of incoming information.  multi-active satellites can occur for a number of reasons, and the fact of the matter is: there just is no good resolution – except to go all the way to 6th normal form.  in 6th normal form, you are very close to anchor modeling, and at that point you are no longer using data vault techniques.

false claim #2: events are hubs

reality: events 99.9% of the time are a cross between time, place, and object (person, service, or product).  because of this, 99.9% of the time they have multiple business keys that they share, which means they *must* be modeled as links and link satellites (they cross multiple hub keys).  if you model a relationship as a hub (as indicated by bar code and documented in the book), then that means you have a single business key representation for this event – it actually does have a true bk for it.  modeling an event as a hub all the time would be incorrect, it does not always meet this criteria, and should not always be modeled.

most of the time, events represent a combination of business keys which mean a relationship is at work.  any time a relationship is at work, the data should be modeled as a link with a link satellite.

false claim #3: when a link is not a link

links are always links, sorry to say, but see false claim #2 above.


if it’s about the data vault model, or implementation methodology, and it disagrees with any material i write, discuss, or publish – then please let me know by emailing me or responding to this post.  i’ll be happy to research it, and discuss it with you.

thank-you kindly,

dan linstedt

Tags: , , , , , ,

5 Responses to “Mis-information about Data Vault”

  1. Andreas 2013/05/25 at 8:54 am #

    Hi Dan,

    What about multi-language support? Are multi-active satellites the way to go for columns that need to be translated into several languages?

    Thanks and Regards,

  2. dlinstedt 2013/05/25 at 1:03 pm #

    Multi-language support is not the same as multi-active Satellites. If you need multi-language, you have five options:

    (not in any particular order)
    1) add fields for the separate strings in different languages
    2) create a separate satellite to store an additional language
    3) You *could* use multi-active Satellite, and add a Language key as part of the PK of the Satellite – but I’m not to keen on this option
    4) Translate the languages on the way to the data marts (one of the best practices and most recommended)
    5) Translate the languages on the way out of the data marts on the way to the BI tool

    Option #4 is generally used and recommended.

    Hope this helps,
    Dan L

  3. Andreas 2013/05/26 at 12:42 am #

    Many thanks.

  4. Martijn Evers 2013/05/27 at 3:18 am #

    The rules for multi active sats and translations isn’t that hard, but it depends on the source data and how you handle lookup/cross reference tables.

    For translating lookup codes I prefer to do translations in the lookup lists.
    In other cases I usually follow the source system for source sats. For non source driven sats I use lookup lists and when it is not a code translation I use a multi active sat with a derived filtering sat for each language (combo of 2 & 3)

    Note that the discussion of where to store translations is different from in which layer to resolve them (DV, DM, BI).

  5. dlinstedt 2013/05/27 at 4:33 am #

    Hi Martijn,

    I agree with your interpretation. Thanks for your comment.

    Dan L

Leave a Reply