Technical Modeling Book is Coming….

i’ve talked about it for the past 2 years (at least).  i’m posting the table of contents, so you know it’s being written and worked on.  i’ll also post the current list of figures, and a couple of excerpts going forward.

i am considering: offering subscription levels for access to the book while it’s being written, if you think you are interested, then send me an email telling me: 1) how much you might pay per month for access, 2) what you expect to get in return.

please keep in mind, that this is a temporary table of contents, and the titles as well as the content is subject to change.  this book is highly focused on data modeling for the data vault.  (not yet available, i’m hoping it will be available before the end of 2010)

if you think i missed anything, please comment on this blog entry.  if you think there should be a specific example case to use in the book, please forward the details to me.  if you contribute a worthy case study + definitions / example and i use it in the book, you will receive a free copy of the finished work.


the purpose of this book is to present and discuss the technical components of the data vault model.  the examples in this book provide a strong foundation for how to build, and design structure in data vault modeling.   this book is a second in the series of books surrounding the data vault model and methodology (approach).  the target audience is anyone wishing to implement a data vault model for integration purposes whether it be an enterprise data warehouse, operational data warehouse, or dynamic data integration store. 

table of contents

1.0     introduction and terminology. 7
1.1      do i need to be a data modeler to read this book?. 7
1.1.1      foundational data vault certification. 7
1.2      review of basic terminology. 7
1.3      notations used in this text 10
1.4      data models as ontology’s. 10
1.5      data model naming conventions and abbreviations. 11
1.6      introduction to hubs, links, and satellites. 13
1.7      flexibility of the data vault model 14
1.8      data vault basis of commutative properties and set based math. 16
1.9      loading processes: batch versus real time. 17

2.0     architectural definitions. 18
2.1      staging area. 18
2.2      edw – data vault 19
2.3      metrics vault 19
2.4      meta vault 19
2.5      report collections. 20
2.6      data marts. 20
2.7      business data vault 20
2.8      operational data vault 21

3.0     common attributes. 22
3.1      sequence numbers. 23
3.2      sub sequence numbers (item numbering) 24
3.3      load dates. 24
3.4      load end dates. 26
3.5      last seen dates. 27
3.6      extract dates. 29
3.7      record creation dates. 30
3.8      multi-temporal date structures. 30
3.9      record sources. 30
3.10   process id’s. 31

4.0     hub entities. 32
4.1      hub definition and purpose. 33
4.2      what is a business key?. 34
4.3      where do we find business keys?. 35
4.4      why are business keys important?. 36
4.5      why not surrogate keys as “master keys”?. 37
4.6      hub smart keys, intelligent keys. 38
4.7      hub composite business keys. 38
4.8      hub entity structure. 39
4.9      hub examples. 40
4.10   dependent and non-dependent child keys. 41
4.11   mining patterns in the hub entity. 42
4.12   process of building a hub table. 44
4.13   modeling rules and standards for hub tables. 45
4.14   what happens when the hub standards are broken. 45

5.0     link entities. 47
5.1      link definition and purpose. 47
5.2      reasons for many to many relationships. 47
5.3      flexibility. 50
5.4      granularity. 52
5.5      dynamic adaptability. 52
5.6      scalability. 53
5.7      link entity structure. 55
5.8      link examples. 55
5.9      link-to-link (parent/child relationships) 55
5.10   link applications. 56
5.11   hierarchical links. 56
5.12   same-as links. 56
5.13   begin and end dating links. 56
5.14   low value links. 56
5.15   transactional links. 56
5.16   computed aggregate links. 56
5.17   vector links (directional) 56
5.18   strength and confidence ratings in links. 57
5.19   exploration links. 57
5.20   capturing changes to business rules over time. 57

6.0     satellite entities. 57
6.1      satellite definition and purpose. 57
6.2      satellite entity structure. 57
6.3      satellite examples. 57
6.4      satellites arranged by source systems. 57
6.5      satellite applications: 57
6.6      record tracking satellites. 57
6.7      status tracking satellites. 57
6.8      computed satellites (quality generated) 57
6.9      splitting satellites. 57
6.10   consolidating satellites. 57

7.0     query assistant tables. 57
7.1      point in time tables. 58
7.2      bridge tables. 58

8.0     reference tables. 58
8.1      code and descriptions. 58
8.2      national drug codes. 58
8.3      icd9 diagnosis codes. 58
8.4      calendars (financial and gregorian) 58

9.0     additional data vault thoughts. 58
9.1      introduction to a business based data vault 58
9.2      metadata and the data vault model 58
9.3      master data and the data vault model 58
9.4      growth patterns and the architecture. 58 

table of figures

figure 1-1: example e-r diagram (elmasri/navathe) 9
figure 1-2: crows foot and arrow notation example. 10
figure 1-3: small example ontology for vehicle. 11
figure 1-4: abbreviations and naming conventions. 12
figure 1-5: example data vault 13
figure 1-6: flexibility of adapting to change. 15
figure 1-7: 3rd normal form product and supplier example. 15
figure 1-8: applied set theory for the data vault 17
figure 2-1: enterprise bi architectural components. 18
figure 3-1: time series batch loaded data. 22
figure 3-2 real-time arrival, data geology. 23
figure 3-3: load date time stamp and record source. 25
figure 3-4: example load date time stamp data. 25
figure 3-5: load end date computations, descriptive data life cycle. 26
figure 3-6: structures containing last seen dates. 27
figure 3-7: left, scan all data in edw, right: scan reduced set 28
figure 3-8: reduced scan set after applying last seen date. 29
figure 4-1: business key changing across line of business. 32
figure 4-2: hub example images. 33
figure 4-3: hub example data. 34
figure 4-4: smart key example. 38
figure 4-5: composite business key hub example. 39
figure 4-6: example hub entity structure. 39
figure 4-7: example hubs from adventureworks 2008. 40
figure 4-8: example of national drug code data vault 41
figure 4-9: dependent child relationship modeling. 42
figure 4-10: typical hub row sizing. 46
figure 5-1: relationship changes over time. 48
figure 5-2: link table structure housing multiple relationships. 49
figure 5-3: starting model before changes. 50
figure 5-4: data vault after modification. 50
figure 5-5: additional data vault model – more changes. 51
figure 5-6: global data vault linking. 51
figure 5-7: uncovering fact table grain. 52
figure 5-8: data vault grain, representing star schema. 52
figure 5-9: traditional data vault storage layout 54
figure 5-10: performance physical split version 1. 54
figure 5-11: performance physical split version 2. 54
figure 5-12: performance physical split version 3. 55


dan linstedt

Tags: , ,

5 Responses to “Technical Modeling Book is Coming….”

  1. av1234 2010/06/08 at 3:14 am #

    I notice that there is a section in your proposed book on end-dating links. This is something that I am considering for a Data Vault Application, but I have not found any previousl mention of end-dating links in the literature, only of end-dating satellites.

    Is the principle of end-dating lins the same as for end-dating satellites, and are there any catches that I should watch out for?

  2. dlinstedt 2010/06/08 at 4:19 am #


    End-Dating Links is NOT something you should do, ever. But be that as it may – the section covers one (only one) exception of when you can do it and why. The section also covers what happens when you DO end-date links, and the risks you take by implementing the solution (breaking standards). Technically it is a similar principle to end-dating Satellites.

    To put it shortly: only transactional links (links with no children) can be end-dated. But this is NOT the best practice. The best practice is to add a time-series Satellite that shows the start and end relationships over time.

    Dan L

  3. av1234 2010/06/10 at 2:41 am #


    Thanks for the clarification. Referring to the example in Data Vault Basics (, if for example the shipper of an order changes, then you are saying that the SAT_SHIPMENT record should be end-dated, and a new LNK_SHIPMENT and SAT_SHIPMENT record created?

  4. dlinstedt 2010/06/10 at 3:57 am #


    Yes, I am saying that if something in the link changes – and it’s NOT a driving key, then the OLD record in the satellite needs to be end-dated. In this case, Order number is the driving key (doesn’t change/shouldn’t change) – where Shipper CAN change (because of mistakes or other reasons), you are correct in your assumption.

    Hope this helps,
    Dan L
    Please don’t forget to tell people about my site & refer them here.

  5. av1234 2010/06/10 at 7:03 am #

    In that case, wounldn’t new LNK_ORD_ITEM, and consequently SAT_ORDITM records also have be be created to link to the new LNK_SHIPMENT record?

    Could this knock-on change be avoided by joining the LNK_ORD_ITEM to the HUB_ORDERS record instead of the LNK_SHIPMENT, or is there a catch with doing this that I haven’t spotted?

Leave a Reply