Foreign Keys in Satellites?

i’ve written about this danger many times, in fact, you can find some great postings and information about this subject on the forums on “data vault discussions” but that’s beside the point.  there are no exceptions to this rule (as i see it).  foreign keys simply should never exist within a satellite.  to do so severely damages the potential flexibility of the data vault model.  let’s chat about some of the mathematics of maintenance behind this idea.

when we look at foreign keys, it is easy to think: “gee, it describes what the parent key seems to be at the moment” while at the same time it represents a link, a join, a relationship to other key or pertinent data.  it’s easy to fall down the slippery slope of simply deciding to add foreign keys to the satellite structures.  there are all kinds of arguments as to “why” this can or should be done – but no matter what the argument – the fact remains: foreign keys in 3nf tables represent a relationship to other data.

in a data warehouse these foreign keyed relationships change over time and maintaining that history is essential for auditability and tracability.  however, this reasoning alone doesn’t negate the “desire” to put fk’s into the satellite.

so what exactly is this driving force that keeps me from saying “ok”?

let’s discuss!   by the way, please remember there are 10 years of r&d behind these rules and standards, and i hope that i’ve tested the architecture in many different situations well enough to understand the most common outcomes of architecture changes.  there is always a chance (a pretty good chance) that i missed something along the way, so if you think of something, go ahead and comment as a reply – and i’ll let you know my experience.  now, on to the question at hand….

when we think about data models we often conveniently forget about complexity ratings.  that is: complexity ratings of the data model itself in relation to: etl /elt loading processes, bi query processes, and of course maintenance of the data model.

more-over a data warehouse isn’t a simple complexity rating!

why?  because a data warehouse stores data over time.  so that means, any time there is a structural change, it impacts all future data (yet to arrive), and all past data already stored in the edw.  it is because of this “past” data set that we need to consider the complexity rating in conjunction with a multiplier effect.   in other words, the complexity rating of a data model without historical data might be “3” on a scale of 1 to 10, where the complexity rating (for absorbing changes) to an edw (because of history) would be a 3^2  or 3^3 – all of the sudden the complexity rating is off the charts.  ie: a 9 or a 27 on a scale of 1 to 10….

this is because the number of items impacted by the change, double or triple – then there is the “data” – what to do about the data that is already stored?   some people fall down this slope and say: “that’s easy, just add the new foriegn key to the satellite and make it optional”.  well, optional foreign keys (even in oltp tables without history) increase the complexity rating on a factorial basis.

to understand this, we look at the complexity ratings used to measure level-of-effort in maintaining code.  in a programmattic sense, what happens to the complexity rating of a procedure/function when a “decision” or condition is introducted?  it makes the code inside the condition, optional based on some data driven element.  the complexity rating increases.  there are tons and tons of formulas to describe these effects in costs per defect, costs per option/decision, performance per decision or “branch”, etc….

the satellites are the same…  don’t introduce optional decision making to a satllite!!!  why?  because the foreign key structure in the satellite represented by the data of yesterday will not equal the foreign key structure in the satellite represented by the data of today.  this forces bi code to become far more complex than it needs to be to account for time-line breaks.

but it get’s worse.  all of the sudden the metadata (meaning/definition) and the “grain” of the data in the satellite is called in to question.  for instance, if you had a satellite off customer today, and it had a foreign key to salesperson – and that fk was optional.  what does the customer data mean?  well – if it has a foreign key filled in, then it means a salesperson sold that customer?  ok, but what happens if another foreign key is added to the satellite:  salesterritory.  now, what does the “old data” mean if it doesn’t have a salesterritory?  does it mean we never got the data?  does it mean we got the data on the feed but ignored it?  does it mean that no-one entered the salesterritory in today’s data?

what exactly does the fk represent?  does the fk mean the customer is in a particular sales territory or does it mean it’s the sales territory of the salesperson, but only when the salesperson fk is filled in?

as you can see, the complexity of deciphering all of these questions begins to raise the maintenance cost of the data vault model…  something we all want to avoid.  if we put the foreign keys in links, then “yesterday’s link a” has customer and salesperson,  a new link (if the data truly means: a salesperson sold this customer in this territory”) would contain: customer, salesperson, and salesterritory (from this date forward).

there is a distinct difference in segregating the relationships out, compared to “storing them with the satellites”.  what you are effectively doing (when you put an fk in a satellite) is “overloading” the definition of the satellite (to use a coding term).

when you overload – you multiply the technical and business definitions, along with applying a factor to the complexity of understanding the information….  this is a bad bad practice.

do not put foreign keys in your satellites!!!  if you do, you do not have a data vault!  furthermore, you will not inherit the benefits of the data vault going forward – and it will eventually cause you to re-design the whole edw from the ground up because maintenance costs will spiral out of control.

there are many other technical reasons, but this – this is the major business reason.

hope this helps clear the air,
dan l

Tags: , ,

8 Responses to “Foreign Keys in Satellites?”

  1. Roelant Vos 2010/04/29 at 1:16 am #

    Hi Dan,

    Would you consider adding foreign keys to -DWH governed- masterdata tables to a satellite? Things like internal classifications or codes (but containing a start and end date!).

    I was thinking of using this kind of trick to classify the type of relationship in a link satellite table. This (foreign key) attribute would refer to a limited set of relationship types in some sort of reference table. The reference table could theoretically change over time but I don’t see the impact on the DWH here because you would always select on the validity time period.

    Because the data is no real source but part of the DWH I figured the exception could be made. The alternative would be to create a big number of link tables.

    Kind regards,
    Roelant Vos

  2. dlinstedt 2010/04/29 at 2:49 am #

    I consider classification codes and type codes and the like to be descriptive data. This is the *one* instance that I allow a “logical” (not physical) foreign key to be embedded in a Satellite. Also because the type code is a role-playing descriptive attribute. I believe I blogged about Type Codes recently, if not, I’ll post something more concrete to this effect.

    So in your case, yes – type codes are allowed. What I’m specifically referring to with no FK’s (or complaining about) is the act of burying business key to business key relationshipsin the Satellites, this is a major no-no. As we move along I’ll try to blog some specific examples of the cause/effect and impact analysis of why this is such a big hang-up for me, and how it can lead to re-engineering in the near future.

    Hope this helps, let me know if I didn’t answer your question.
    Dan L

  3. M. Streutker 2010/08/02 at 8:32 am #

    Hi Dan,

    after reading this article I was a bit confused about the usage of foreign keys in a datavault.
    In part 2 of your Data Vault Series, you give an example of the Northwind Database as a physical Data Vault Model.
    This example includes a lot of foreign keys on satellite level, i.e. the SAT_ORDERS satellite includes keys to EMPLOYEEID, CUSTOMERID, etc.

    How is that sample different from the advice you give in this article (don’t put FK’s in your satellites)? Or is that sample a bit outdated?

    Kind regards,

    M. Streutker

  4. dlinstedt 2010/08/02 at 6:59 pm #

    Hi M.

    The advice I gave in part 2 of my DV series (sorry to say) was a little incorrect. Part 2 was written in 2001, and since then I’ve performed a lot of testing on the architecture, and I have discovered that introducing foreign keys to Satellites (other than the parent key) causes great re-engineering efforts to take place every time a change is desired by the business. This is opposite of what we truly hope to achieve.

    The sample / example is quite out-dated.

    Hope this helps,
    Dan Linstedt

  5. M. Streutker 2010/08/03 at 12:50 am #

    Hi Dan,

    thank you for your clarification and quick response.
    It’s good to see that you continue to improve the architecture, I’ll keep in mind that the original series might a bit outdated on certain areas.

    Kind regards,

    M. Streutker

  6. dlinstedt 2010/08/03 at 4:25 am #

    Hi M,

    You’re welcome. It’s this kind of knowledge (and more) that are embedded in my coaching section of this site. Let me know if you’re interested in signing up. You have a unique opportunity at this time to get custom material built to answer your specific questions.

    Dan L

  7. Basil Peace 2015/10/09 at 10:48 am #

    What if my source is a relational database, and it has many-to-one relationship. This relationship is considered as an attribute.
    My entities are the entities from this source system, so I’m sure that nothing changes before changing the source system (which would be very expensive and almost improbable). Even if that would happen, entire data vault should be remodeled.
    1. Could I model this specific case as attributes (satellites) instead of relationship (link)?
    2. Can I place FK on satellites in this specific case?

  8. Dan Linstedt 2015/12/09 at 7:39 am #

    Hi Basil,

    These concepts have been discussed at length here on my blog, as well as in the LinkedIn discussion groups: “Data Vault Discussions”. I actually teach these concepts in my CDVP2 (Data Vault 2.0 Boot Camp and Private Certification) class. You can also dive in deep to these concepts by reading our new book: “Building a scalable Data warehouse with Data Vault 2.0” available on today.

    The short answers are as follows:
    1. No – you cannot break the rules, to do so would damage the flexibility of the architecture
    2. No – you can *never* put other FK’s on Satellites in these cases, they would not be satellites but Links for these reasons. Again, it breaks the flexibility of the model.

    Hope this helps,

Leave a Reply