jimmney crickets! it’s that time again!!! i have to say this is an area near and dear to my heart. language and semantic expression has always been a part of my research since 1987, yes, you heard me right… 1987 (who would have thunk?) anyhow, i’m here to talk a bit about unstructured data, and more-so, semi-structured data, in the form of data models, content, relational database structures, and hierarchical (cobol) structures.
for a long time i had a feature in rapidace called model consolidation. it did what other tools could not do… take 2 to n disparate data models and “merge” them together through proprietary algorithms. it was fairly effective. i was not the first, and i won’t be the last, to see that data models are nothing more than ontological expressions of “how we store and understand” our data. in other words, lending context to our data sets and our structures.
oh yea, what about unstructured data? i thought you were going to talk about that…
yep! it’s all a part of what i have to say… you see, unstructured data really is structured at the heart of it – it’s all bits (ones and zeros), so in reality it truly is structured! but, understanding the unstructured data, that is: lending context to it is where the genius comes in. (that would be the human mind most of the time).
in working with data structures (cobol copybooks) or relational table structures what we are really doing is lending context (albeit simple context) to what the content within the table/column is supposed to have. these are lables to data that make it understandable and consistently accessible to sql (standard query language). which we all know has been formalized mathematically for data access against these structures.
now, where’s the good part? take a data model and turn it in to an ontology; and you can begin creating magic, especially when you apply business terms to the structures!! cobol copybooks are hierarchical ontologies, already organized in to parent and child combined relationships. what’s so special about this? you can see it all over the world. there has been a ton of work done on this subject, you can leverage it to your hearts content… for free even (if you have enough gumption to figure out how to work the academic products that exist).
look it up: it’s called ontology alignment…
type the term into google, and you begin to see different tools and articles written on the subject of aligning metadata… ie data structures.
what does this all mean? it means: 1) if you can turn your data model into an ontology 2) you can absorb the ontology into one of the fancy tools 3) you can use these fancy tools to “merge” ontologies in a human guided fashion. the end-results? you produce a single consistent merged data model…
here are some links you might be interested in:
http://www-01.ibm.com/software/data/infosphere/fasttrack/ (cross-walk management, etl gen) ($67,000) + expensive!!!
http://www.wherescape.com (etl generation + data profiling) (don’t know the price)
http://www.bi-ready.com (etl generation, data warehouse black-box)
http://www.qosqo.nl/ (opensource quipu product, model engineering + etl generation)
http://www.analytixds.com (cross-walk management, etl generation)
forgive me if i’ve got this wrong, or missed anything – feel free to comment to correct the post if you see fit.
by the way, just in case you’re thinking of trying this yourself… in your own product… be aware of the following patent application:
i will just say this: there’s power in these concepts, but it’s a matter of getting it right that matters. all of this (data model consolidation included) can be done, from data model structures alone – but profiling the information makes it that much more powerful..