it has come to my attention that the industry has a misconception, and a perception issue, and a misunderstanding of what multi-active satellites are and defined to be. in this entry i hope to pull away the covers, and clear the perception issues. please keep in mind, this is taught in my cdvp2 data vault classes for certification.
what is a multi-active satellite?
a multi-active satellite is a table modeled with a satellite structure, that contains multiple active records per key, per point in time.
why build one?
when the source system has multiple descriptive rows that are all valid and active at the same time, for a specific business key, the warehouse needs a way to represent these. unfortunately the descriptive rows do not have a unique business key of their own (or this problem would never present itself in this manner).
what a multi-active satellite is not:
there seems to be an attempt to “create” a multi-active satellite in a target model when: the source system has multiple inter and intra day changes that happen between batch cycle based loads. this does not qualify as a multi-active satellite. if you have modeled a multi-active satellite based on this requirement, then you have built an incorrect target data vault model, and it needs to be corrected.
what are intra-day and inter-day updates?
changes made to source system data / edits really, where the primary key doesn’t change. unfortunately, batch load cycles still exist to pull the source data, so designers incorrectly assume: just to create a multi-active satellite. people make this mistake, even though, the source says: only one record is “active” at any given point in time, it is just receiving multiple updates throughout the day.
how does this mistake happen?
designers incorrectly assign a batch load date to the incoming data, which makes the pk no longer unique in the satellite record. then, they say: well, i have multiple intra/inter day changes to accommodate, so we will just make it a multi-active sat and be done with it. this is the mistake that is often made.
what do to about the mistake:
go back to class, study the multi-active satellite. the real definition offered here. if you have the case stated above (mistaken identity if you will), and you are still insisting on using load date = batch date, then you will continue to have problems.
the designer needs to understand: in data vault 2.0: the load date for the satellite (for any table in the data vault) is the date and time of insert of the data to the target data vault table. if this is still not fine-grained enough, then add a sub-sequence for ordering of the batch level changes.
remember: multiple intra and inter day changes do not make for a multi-active satellite unless the source system says: it has multiple active descriptive records at the same time for a single point in time.
i hope this clears the air.
(c) dan linstedt, 2016, all rights reserved