Pirate Peg Leg

#datavault Pirates, Peg Leg Links and Business Keys

when are hubs & links like peg-legged pirates of old? and what do composite business keys have to do with pirate treasure gold?

the more i go, the less i like the idea of composite business keys being candidates for hubs.  here, i present a source system scenario, and explain how it plays out in creating a data vault as far as creating hubs go. in my mind, i can only find a few small instances where composite business keys truly make sense.  perhaps, just maybe, i am leaning towards anchor modeling for the business keys?

in any case, if you have examples of where you’ve successfully used composite business keys in hubs, and why the make sense, i’d love to hear from you.  please post your answers below in the comments section.

the problem (as i see it)

let’s take an example shall we? note, this is pseudocode!  notice the use of business key declaration in the ddl – makes it invalid as a proper ddl example.

create table contract (
   contractnum int not null,
   revisionmum int not null,
   business key (contractnum, revisionnum)

in reality: what does this key represent? and what is the true business key? and why is this a problem?

well, for starters: there can and should be one and only one contract number for the whole system.  just imagine, if there were the same contract numbers assigned to different customers – but because of the revision number, it made it unique!

so: tom has contract #12345 revision 2, and mary has contract #12345 revision 5 – totally valid.

yet from a business standpoint, this *should* be an invalid business case!!  tom and mary should have separate contract numbers, to be represented properly.

yet: if you look closer, revisionnum is a weak business key.  by itself it has no meaning.  in other words, revision 5.  who has revision 5? what is revision 5?  well, in the data set – any “contract” that has been revised 5 times or more will have a revision number of 5…

so is this a sticky case?  yes.  can it be easily resolved? not without human intervention.

the mechanical solution

for those of you using a tool to generate data vaults, the mechanical solution is also sticky, but necessary.  there are a series of composite rules that rapidace v2 follows to produce a much better data vault than rapidace v1.  rapidace v1 splits both fields in to single hub keys – without regard to anything else.  i can’t comment on quipu, or bi-ready, because i don’t know what they do in this situation.

in wherescape, the human is partly involved in making a decision on structure – which is where this responsibility should lie.

(by the way, there is a post on linkedin, asking about this very condition – using different fields)


so: the mechanical solution: to produce what is given to you – with a number of different rules guiding the production of hubs.  if you want to see the results of these rules, then you will have to use rapidace v2 saas (when made public) in order to do it.  signing up to be a paying saas customer will provide you with the user guide where the examples are documented in detail.

for the short answer: rapidace v2 would *most likely* produce this set of fields as a single composite hub.

the manual answer:

if you are doing this manually, then i would suggest you examine the rest of the model – and ask yourself the following questions:

  • is “your field” (like revisionnum) a dependent child?  if so, then most likely it will become a composite hub
  • do you have a need or requirement to have a unique list of contract numbers?  if so, then you might break the hub out for contract number, then: (here’s the odd part), you would end up with a link that houses contract sequence and revision number – since revision number is weak – you wouldn’t model a hub for it.  this would lead to a peg-legged link.

ahoy there matey!

anyhow, the satellite (if there are additional fields in the table) would hang from the peg-legged link if you decided to create one.  otherwise, if you use a composite business key hub, then the satellite would hang from it.

when to break apart composite bk’s

sometimes it is necessary to break apart the business keys, especially if they are smart keys; and this is where we start to go down the path of 6th normal form, or anchor modeling.  however the trade-off (there is always a trade-off) is that you end up with more joins and more tables, but if you’ve switched to a k=v, triple store, or columnar datastore, then this will not matter to you – because your dv model will become more logical control over your structures anyhow.

now, if you can or want to buy in to the trade-off -then a funny thing can happen: the more parts of the data model that rely on that key, the more likely that satellites can fold their relationship and hang from that instance of the hub.  meaning “less overall join tables”…. but be careful… the more composite keys that exist in the model, the more link tables will be formed.

smart key = where the entire key is broken in to segments, and each segment has meaning.

what’s our heading captain?

well, the compass points north (at least for now), and what it says is this:

if the composite business key is not a part of any foreign key (meaning it has no “parents”) then, it shall remain intact as a composite business key in the hub, rather than fishing around for additional link tables and catching a whale of a tale!

if you decide that there is value in breaking it up, then do so – just realize you might end up with a peg-legged link!

what are your thoughts? comments?  love to hear from you.

dan linstedt
ps: you can read all about degenerate or weak business keys/child dependent keys in my book: super charge your edw which you can get here: http://datavaultalliance.com/purchase-book


Tags: , , , , , ,

8 Responses to “#datavault Pirates, Peg Leg Links and Business Keys”

  1. Kent Graziano 2012/05/03 at 7:58 am #

    I am with Dan on this one – I have rarely seen a valid composite business key that did not end up really being a Link in the end. My very first data vault at Denver Public Schools ended up with a few peg-leg links. I remember we were struggling with what to do with some of the data and wrapping our minds around the unit-of-work concept. We had Dan come in to help us out and low and behold the peg-leg link was born. Problem solved (arrr!).

    The key was really looking at the components of the business key and asking the basic questions – should we be able to get a unique list of w/o reference to and are there attributes related strictly to that we need to track that do not vary with . If yes, then you need to break them up and form a separate hub.

  2. Wojciech Zawadzki 2015/05/08 at 2:22 am #

    In insurance, an installment (pay date, value, etc …) does not stand alone (so can’t be a hub) – is uniquely identified by business key policy reference (policy number) which is by all means a true hub, and a serial no (1,2,3,..) dependent on the number of installments. Installment is then a peg-legged link between a policy hub and weak business key serial no, which by no means qualifies for a hub.

  3. Dan Linstedt 2015/05/08 at 2:13 pm #

    Thank-you for your comment. I appreciate hearing from customers, and welcome the feedback. This is good to know. Please keep me updated as to the progress of your DV efforts going forward.

    Dan Linstedt

  4. Greg Locke 2017/09/15 at 4:18 pm #

    Hi Dan;
    I came across a scenario recently where the client wanted to integrate design data from multiple projects into a single enterprise vault. The challenge was the client created a new instance of each source system for each new Project and Design Object ID’s were guaranteed to clash. Sometimes the Design Object ID’s were copied from one Project to another. So, while the Project was a Parent Hub on it’s own, the Design Object Hub could not stand on it’s own.
    A similar scenario would occur when a multinational company was merging a chart of accounts or ERP systems from multiple subsidiaries. Then Company becomes a requirement for composite business keys, Account Codes, Employee IDs, etc. In larger insurance companies that have multiple policy administration systems, Company is a typical component of the Insurance policy key which ties back to the books for that business.
    In the above scenarios, I would vote for a Hub composite key. Is there a better alternative?

  5. Dan Linstedt 2017/09/25 at 12:40 pm #

    Hi Greg,

    In this scenario we can do two things: 1) include a company ID (as you say, is required), or 2) a multi-tenant ID (same type of deal, resolved using reference data). There really isn’t a better alternative, as they need to be stored in the same Hub for integration purposes. Then, when the accounts are supposed to combine, apply a same-as link to it.

    Hope this helps,
    Dan Linstedt
    PS: we teach this and more in the CDVP2 class

  6. Cecil Jones 2018/03/12 at 2:25 pm #

    In our source systems we have a record that is based on two business keys, Invoice_No and Order_No. Both are business keys for used by the company. An Invoice can contain 1 to n orders AND an order can be on multiple invoices. The source system is modeled that these to values, Invoice_No and Order_No are the unique key on the table.

    I think this would be a situation where I would create a Hub for the Invoice and a Hub for the Order then create a Link table on Invoice_No and Order_No to hang the satellite off of it.

  7. Mathias 2020/03/21 at 1:58 pm #

    Hi Dan,

    so your solution technically means that we break the Hub containing the CompositeBK into one Hub containing the contractnum and one Link containing contractnum and revisionnum? I assume this will work only when there are no other Links connected to this Hub, right?

    Best Wishes

  8. Dan Linstedt 2020/04/08 at 7:59 am #

    Hi Mathias,
    No, that’s not what it means. This article was posted a long time ago. The peg-leg link is a MISTAKE in data vault modeling, and we should never ever be building it. The article highlighted what to do in an edge case. Unfortunately this edge case is a direct result of “bad source modeling” in the operational system. Hope this helps, Dan

Leave a Reply