when are hubs & links like peg-legged pirates of old? and what do composite business keys have to do with pirate treasure gold?
the more i go, the less i like the idea of composite business keys being candidates for hubs. here, i present a source system scenario, and explain how it plays out in creating a data vault as far as creating hubs go. in my mind, i can only find a few small instances where composite business keys truly make sense. perhaps, just maybe, i am leaning towards anchor modeling for the business keys?
in any case, if you have examples of where you’ve successfully used composite business keys in hubs, and why the make sense, i’d love to hear from you. please post your answers below in the comments section.
the problem (as i see it)
let’s take an example shall we? note, this is pseudocode! notice the use of business key declaration in the ddl – makes it invalid as a proper ddl example.
create table contract ( contractnum int not null, revisionmum int not null, business key (contractnum, revisionnum) )
in reality: what does this key represent? and what is the true business key? and why is this a problem?
well, for starters: there can and should be one and only one contract number for the whole system. just imagine, if there were the same contract numbers assigned to different customers – but because of the revision number, it made it unique!
so: tom has contract #12345 revision 2, and mary has contract #12345 revision 5 – totally valid.
yet from a business standpoint, this *should* be an invalid business case!! tom and mary should have separate contract numbers, to be represented properly.
yet: if you look closer, revisionnum is a weak business key. by itself it has no meaning. in other words, revision 5. who has revision 5? what is revision 5? well, in the data set – any “contract” that has been revised 5 times or more will have a revision number of 5…
so is this a sticky case? yes. can it be easily resolved? not without human intervention.
the mechanical solution
for those of you using a tool to generate data vaults, the mechanical solution is also sticky, but necessary. there are a series of composite rules that rapidace v2 follows to produce a much better data vault than rapidace v1. rapidace v1 splits both fields in to single hub keys – without regard to anything else. i can’t comment on quipu, or bi-ready, because i don’t know what they do in this situation.
in wherescape, the human is partly involved in making a decision on structure – which is where this responsibility should lie.
(by the way, there is a post on linkedin, asking about this very condition – using different fields)
so: the mechanical solution: to produce what is given to you – with a number of different rules guiding the production of hubs. if you want to see the results of these rules, then you will have to use rapidace v2 saas (when made public) in order to do it. signing up to be a paying saas customer will provide you with the user guide where the examples are documented in detail.
for the short answer: rapidace v2 would *most likely* produce this set of fields as a single composite hub.
the manual answer:
if you are doing this manually, then i would suggest you examine the rest of the model – and ask yourself the following questions:
- is “your field” (like revisionnum) a dependent child? if so, then most likely it will become a composite hub
- do you have a need or requirement to have a unique list of contract numbers? if so, then you might break the hub out for contract number, then: (here’s the odd part), you would end up with a link that houses contract sequence and revision number – since revision number is weak – you wouldn’t model a hub for it. this would lead to a peg-legged link.
ahoy there matey!
anyhow, the satellite (if there are additional fields in the table) would hang from the peg-legged link if you decided to create one. otherwise, if you use a composite business key hub, then the satellite would hang from it.
when to break apart composite bk’s
sometimes it is necessary to break apart the business keys, especially if they are smart keys; and this is where we start to go down the path of 6th normal form, or anchor modeling. however the trade-off (there is always a trade-off) is that you end up with more joins and more tables, but if you’ve switched to a k=v, triple store, or columnar datastore, then this will not matter to you – because your dv model will become more logical control over your structures anyhow.
now, if you can or want to buy in to the trade-off -then a funny thing can happen: the more parts of the data model that rely on that key, the more likely that satellites can fold their relationship and hang from that instance of the hub. meaning “less overall join tables”…. but be careful… the more composite keys that exist in the model, the more link tables will be formed.
smart key = where the entire key is broken in to segments, and each segment has meaning.
what’s our heading captain?
well, the compass points north (at least for now), and what it says is this:
if the composite business key is not a part of any foreign key (meaning it has no “parents”) then, it shall remain intact as a composite business key in the hub, rather than fishing around for additional link tables and catching a whale of a tale!
if you decide that there is value in breaking it up, then do so – just realize you might end up with a peg-legged link!
what are your thoughts? comments? love to hear from you.
ps: you can read all about degenerate or weak business keys/child dependent keys in my book: super charge your edw which you can get here: http://datavaultalliance.com/purchase-book