please note: rapidace has been decomissioned and removed from the market place – it is not accessible anymore.
sorry for my long silence, i’ve been hard at work completely rebuilding the core of rapidace, and will soon make it available as saas. anyhow, this entry is to talk about code-generation for the data vault. what you should consider, what you need to demand from the vendors of tools who generate etl/elt code, and what you should come to expect in this market.
it’s a well known fact that i’ve been working on rapidace. the acronym stands for: rapid architecture consolidation enablement (or engine) which ever you prefer. the whole notion of “maturing” your edw is to get the entire architecture from etl to data model to project under control, basically to formalize it using standards, and best practices. once it has been assigned, designed, and prescribed, usually with a bit of brow-sweat and up-front hard work, the proper solution can be generated with near 90% accuracy.
of course, your mileage may vary depending on the vendor you choose… especially if that vendor has not elected to work with me. however, back to the point. there are certain things that you should ask of your vendor who is providing the tool to generate the data vault models. hopefully i can shed some light on these considerations.
the inputs that are required by rapidace (new version) are full and complete. they require up-front work in the definition and understanding of business keys. particularly since rapidace works to consolidate multiple disparate models together. the inputs you should be providing your tool include the following:
- source data model – preferably in ddl format, but could be xml, xsd, dtd, cobol copybooks, or standard table structures.
- target data model – the same as above
- ontology – rdf2 or owl format – used to describe abbreviations, shared definitions, business keys, and a number of other things
- mapping design preference – what good is a generator that doesn’t use a template system to produce it’s mappings for you?
- etl naming conventions – for whatever etl or elt or sql you are generating, you should be able to specify the naming conventions you’d like to use.
- data model naming conventions – these should specify the prefixes and suffixesused to classify and identify table structures in the ddl
- normalization preferences – applied when reading xml, xsd, dtd, or cobol copybooks – the model should be normalized first before consolidation
and there are a few more that i can think of… however, one of the points to this entry is to show you, that generating etl / elt code for your environment, particularly for multiple model situations, is not necessarily an easy process. nor is it cut-and-dry. the largest issue that i’ve heard about is the lack of up-front work done to identify business keys. it’s the business keys in the data vault that make all the difference.
one more thought on the subject…
most tools, if they generate just etl should use the above inputs at a minimum. if they don’t, then you should be asking them: why not?
at the end of the day, if nothing else, you should remember this: you can’t or shouldn’t forward engineer (automatically generate) a data vault model directly 1 for 1 with the source model. this is not the proper way to do business, and to actually follow this line of thinking can turn your data vault model into mush. if you or your vendor wants to generate data vault models, you must (or your vendor must) undertake the task of identifying and using business keys in their generation. this is the proper and only way to truly auto-generate a data vault model, along with the etl code underneath.
a lot of engineering thought must go in to the proper preparations, or the model will not fit the bill, nor rise to the occassion.
thoughs? what are your experiences?