DV modeling for CLOB, LOB, BIN, IMAGE…

This post is all about how-to build the proper data vault model to handle CLOB, BLOB, LOB, BIN, BINARY, IMAGE fields.  Basically fields that are BINARY in nature.  In this _very_ short post I cover where to put these fields in the Data Vault Model and why.  In my new one-on-one coaching section I will teach you how to use these fields in ETL/ELT. 

See my posting about my new one-on-one coaching, it is open to a limited number of signups for a 3 month program – and generally has a waiting list for the start of the next program.  Contact me now at: coach@danlinstedt.com for more information.

These fields are thorny, problematic, and hard to deal with in a data warehousing sense, and many times people make mistakes in the Data Vault model and don’t place these fields in the right structures.  Remember what I teach: that it all begins and ends with the data model architecture.  Make a mistake in the architecture, and it will cost you dearly!

So, where do we put these fields?

These types of fields belong in their OWN separate Satellite – that’s right, all by themselves.  You can add another column to the Satellite called a hash-key (if you want).  Then, you can run the binary data through a hashing function and determine the key.  The hashing key can help you with distinct clauses (removing duplicates from the staging table), and with comparisons to the “current” row in the satellite.

Why separate the fields?

The point is: separate, divide and conquer.  It’s the only way to deal with these particular binary fields.  They are long, ugly, and unparsable by standard SQL actions.  Field separation is necessary so that standard structured content isn’t mixed with “unstructured” content.  We have no way of knowing how long the fields are, directly comparing the fields to themselves and other rows, and manipulation of these fields typically requires special handling routines which end up being slow to execute.

Example Please!!

You asked for it…  so here it is:  suppose I have a staging table:



   person_badge   varchar(25),

   person_name    varchar(50),

   person_addr    varchar(50),

   person_phone numeric(12)

   person_picture image

   primary key (person_badge)



Ultimately what you need in the Data Vault model should look like this:


create table HUB_PERSON (

   person_sqn numeric(12),

   person_badge varchar(25),

   person_load_date date,

    person_rec_source varchar(25),

   primary key (person_sqn)


create table SAT_PERSON_INFO (

   person_sqn numeric(12),

   person_info_load_Date date,

   person_info_load_end_date date,

   person_info_rec_source varchar(25),

   person_name    varchar(50),

   person_addr    varchar(50),

   person_phone numeric(12),

   primary key (person_sqn,person_info_load_date)


create table SAT_PERSON_IMAGE (

   person_sqn numeric(12),

   person_img_load_Date date,

   person_img_load_end_date date,

   person_img_rec_source varchar(25),

   person_picture image,

   person_img_hash_key varchar(64),

   primary key (person_sqn,person_img_load_date)



What this will do is allow you to construct a hash key (make the length of the hash either 32, 64, 128 or 256) depending on how unique you want it to be, and how much “tolerance for misses” you can take.  The smaller the hash key, the larger the tolerance (the more changes you MIGHT miss).

By separating the Satellite with the image, and the rest of the binary fields, you can effectively model the Data Vault and produce the proper architecture and handling mechanisms needed to run the ETL efficiently.

DON’T FORGET TO CONTACT ME about reserving your spot in my one-on-one coaching session opening up soon!

Hope this helps,
Dan Linstedt

Tags: , , , , ,

2 Responses to “DV modeling for CLOB, LOB, BIN, IMAGE…”

  1. marius 2010/06/11 at 2:48 am #

    Hi Dan

    I like the structure of this post, especially the example. It makes things a lot clearer.

    I look forward to more.


  2. dlinstedt 2010/06/11 at 4:00 am #

    Hi Marius,

    Thank-you for the kind comment and the feedback. I will attempt to produce more like this.

    Dan L

Leave a Reply