i’ve been wandering around the apache site, and a few different blogs that discuss hadoop, hbase, hive, and so on. apparently there are some commercial dbms like implementations on top of and around hadoop (such as teradata aster data, and greenplum). anyhow, here is a bit more information on data vault and hadoop. if you like these posts, please comment – let me know.
if you are serious about hadoop and relational systems, you really should check out teradata’s asterdata systems. these are commercial implementations of fully relational “metadata” and sql parsers built on top of hadoop and mapreduce systems. you might also be interested in greenplum systems. at the moment, i don’t know much about either of these, but they seem (from the marketing literature) to be viable solutions.
anyhow, taking a look at the code layers that you might need to generate to support data vault on hadoop directly seemed a bit “expensive” in the maintenance areas. while nice, and probably very direct – and thus (very fast if done properly), it can be horrible for maintenance or updates. especially when the data model changes – leading back to code changes and re-compiling.
so i started thinking, what might be available that can help with the implementation aspects of relational database systems on top of the hadoop and mapreduce components? i started looking in to hbase, and hive. i think there are a few more out there that are also available (anyone care to comment?).
the neat thing about hive is it has a sql metadata management layer with “relational like” structures. which means, it’s more apt to work “similar” to that of a traditional rdbms (only where the metadata and sql access are concerned). the mechanics below that of course are all interpreted, and are all hadoop & mapreduce code.
i would suggest you take a look at:
for more information on hive, and how to use it’s capacities. i will be looking around a bit more to see what other sql-related projects there are for hadoop and mapreduce. eventually i will settle on one and get to installing and playing with it.
hbase seems interesting, but it is a column-based store, rather than a relationally based store.
there’s nothing wrong with choosing a column based store, but the way data models interact with it change.
please post your comments & questions below.