Innovation.

Value of PIT and Bridge Tables in Data Vault 2.0

This is a short entry about the value received when properly implementing PIT and Bridge tables as defined in Data Vault 2.0 (Building a scalable data warehouse with Data Vault 2.0) – the link is to the Canadian Amazon site because I am in Canada as I write this entry.

What is a PIT or a Bridge?

Let me start off by offering a brief definition (the full definition is offered in the book, and in my CDVP2 – certified Data Vault 2.0 class).  You can find the class on: http://DataVaultCertification.com

A PIT is: a Point In Time, a System Driven Satellite loaded with a Hub or Link Hash Key, Hubs’ Business Key(s), and the surrounding Satellites’ Primary Key values.  These PK’s are a full copy of the entire PK housed in the Satellite.  It is absolutely vital that you follow the standard for the full PK, and not just copy the “Load Date only” to the PIT table.  This, again is explained in detail in the Certification class and in the book.

What is a Bridge Table?

A bridge table is quite different in structure – it houses Hub and Link keys only (for starters), and is a cross-combination of keys that are governed by a where clause (number of rows are controlled by a business use case / requirement) for just what the business wants to see.  No, this is not the same as a fact table.  Again, the standards are defined in the class and in the book.

What are the similarities?

The following is a list of similarities:

  • Snapshot based, driven by schedule
  • Sit in the Business Data Vault side – which means you can add computed fields, and copies of Satellite fields needed in Where Clauses issued by ad-hoc user queries (through views)
  • Raw Data, Key structures

What is the purpose of each of these tables? why build them?

Well, quite frankly, the loading process to build each houses LEFT OUTER JOIN components.  Before you climb my tree on performance, please note: they should be run frequently, which means they should only go after the “latest data sets”, which means, they should be “thin” in the amount of data they are left-outer-joining.

The purposes of each of these tables are:

  • to eliminate all outer-joins from the AD-HOC layers above, allowing Equal-Joins to take place
  • to provide full scalability in Views or Virtual Dimensions and Virtual Facts on top of the Data Vault
  • To BUFFER data release in accordance with SLA’s regardless of data arrival rates (ie: think microsecond real-time inserts)
  • To co-locate data and indexes (in an MPP solution) where necessary
  • to enable table-elimination from the AD-HOC query engine going through the views on top of these tables.
  • to enhance partitioning
  • to enhance performance
  • to allow any “star-join” optimizations to be utilized (when offered by the query resolution engine)

Keep in mind, that this technique is one of the single most powerful techniques inside Data Vault 2.0, and that utilizing these techniques can enable you to offer VIRTUAL (view based) dimensions and facts for 95% of your user based requests.  Eliminating the need to physicalize Yet-Another-Copy of the data in downstream information marts.

When properly applied, they work in all environments appropriately and can provide tremendous value to the business as well as IT.

PS: Kent Graziano has some super blogs about virtualization and views, and building of PIT and Bridge structures.  You should check it out!

As always, I welcome comments below.

Hope this helps,

(C) Dan Linstedt, 2016 all rights reserved.

Tags: , , , , , , , ,

No comments yet.

Leave a Reply

*