Physical Data Models and Ontologies
i believe there is a strong correlation between ontologies and data models, or at least there should be. data models (just like hierarchical file systems definitions, or xsd’s) define structure. when they include primary and foreign key relationships, they also define hierarchies. imagine that! parent-child relationships defining an ontology. now that’s cool stuff man!
you see, i view data models as “technical definitions” or technical terms. why? because they are physical names of data elements that usually make no sense to the end user (the business user). well, everyone who does data modeling knows about the difference between logical and physical names, logical and physical models, etc… over the years, i’ve noticed a trend by data modelers (me included) to attempt to tie physical to logical names where possible. now it’s not all cases that this can be done – especially if the logical model represents a concept that breaks out into multiple tables, or multiple fields in the physical structure. but: this should /could be represented by an ontology.
ok, back to the task at hand. why is it that we all know and love/understand data profiling, but we don’t know – or haven’t even considered meta data profiling? (ie: terminology profiling against the structure of the data model)… shameless plug: this is something that rapidace does…
metadata profiling, or structure profiling is a powerful concept. in order to make this work, we need to describe basic ontology of terms within the physical data model. but it goes deeper than that, we also need as a part of that ontology to include: prefixes, suffixes, common abbreviations, synonyms, homonyms, and antonyms. all of these things should be rolled out to an owl or rdf data structure, then imported to a “data model profiling application”.
by constructing these ontologies that are hidden in the data models we can begin applying fuzzy logic to perform tasks like: structural matching, structural super/sub-set, percent completeness, correlation analysis and confidence ratings that “models-schemas-tables-fields are either the same or different” even though they may be called “id” field… this data modeling profiling tool needs to use any descriptions embedded in the data model as well, this will help with the cross-referencing and establishment of context for the physical data field…
ok – i ramble…. yes, i’ve got a tool i’m writing that should be available maybe next year?? that does a lot of this…
the point is this: there’s a lot of contextual information to be retrieved from data models (physical) world-wide. there’s an infinate combination of processes one can apply to disparate data models – like merging, mixing, matching, combining, separating, etc… you might even call it: “etl for metadata”… ouch that hurt (i don’t like the term).
data models can and should be viewed as an ontology, combine the ontological information with contextual information (a little elbow grease and hard work to augment it), and voila, you’ve got a really powerful system. this is something you could potentially use to shorten the life-cycle of due dilligence in the m&a world, or even in data model consolidations…
what do you think about data models and ontologies? i want to know…
dan l
danl@danlinstedt.com
No comments yet.