Futuristic Data Warehouse

if there is such a thing, it would only be futuristic until the features are available and surpassed.  anyhow, i thought i’d take a minute (just for fun) to spell out what i think would be the ultimate data warehousing/bi system.  it’s time to get your wish-list out, and add to this set of requirements – this is a brain-storming session, and i hope you will take the time to comment.

attributes of the futuristic data warehouse:

  1. it will be 100% real-time feeds, no more batch feeds to deal with
  2. it will be 100% web-service driven (*for inflow and alerts).  forget etl/elt and data integration movement tools, all data will arrive via message queues.  there will be no tools between the database/data warehouse and the message queue, in other words the queues will deliver direct to the database system.
  3. it will have real-time messaging alerts
  4. it will allow xml through xsd feeds, and will warehouse the xsd structural definitions.
  5. it will offer mpp columnar based database
  6. it will have ram based data (in-ram parts of the database for hot data sets).
  7. it will have olap cubes built in to the database engine
  8. it will have virtual data marts (make them up as you go along – right along side of pre-defined data marts), but all data marts will be virtual.  the queries will run faster as they get used more often – data sets will continue to be cached, discarded, and re-built based on usage.  in other words…
  9. it will have dynamic structured virtual data marts (no more “copies” of the data sets needed)
  10. it will have cell level (row & column & role & user) based data privacy/masking and protection.
  11. it will allow logical data models to be defined on top of the data set, it will no longer need physical data model definitions
  12. it will have time-series / temporality built in to it’s engine, it will automatically mark “current” data sets.
  13. it will have automatic “delta” checking on inserts, you will no longer have to rely on logic to do this
  14. it will have automatic “de-duplication”, it will no longer maintain duplicate records
  15. it will have automatic sequencing of records
  16. it will have “lazy indexing”, in other words – massive numbers of inserts will take much of the compute power, leaving the indexing to “catch up” when the inserts are done.
  17. it will automatically take care of statistics (not needing you to manage and maintain the system)
  18. it can handle batches, if necessary – but it shouldn’t be necessary.
  19. it will have “row level locking” in the database against a logical grouping of records.  in other words, in one logical model you represent the “data warehouse with history”, in a second logical model (overlayed on the data set), you represent the oltp or transactional system.  because the data sets are stored in a columnar fashion, this means that indexing systems make the difference in using the data in the database (depending on the model or mode of access).

these are just some of the things i hope the future database engines have.  how many features do you think you would use?  what other feature sets do you see as critical?

let me know,
dan l

Tags: , ,

2 Responses to “Futuristic Data Warehouse”

  1. RvS 2010/05/18 at 12:48 am #

    Interesting, as always.

    Terms like Real-Time feeds, message queues, and XML make me wonder if the futuristic data warehouse (in potential) isn’t already out there. My ‘futuristic data warehouse’ is one that thrives on the queue, and is an integral part of the organisation, it’s processes, and it’s services. Must admit, sounds like BPEL.

    Curious, where do you fit in the massive and seemingly ever increasing amount of unstructured data?

  2. dlinstedt 2010/05/18 at 4:44 am #

    There are many different technologies that are out there today, none of them have “temporality” built in to the core engine (take care of it seemlessly). Regarding unstructured data stores, I’ll write another blog entry on that one… It’s an interesting topic that Bill Inmon and I have discussed many times.

    Short answer is: unstrtuctred needs to have “structured pointers” and/or tags that basically define it’s context and content. Otherwise the two (structured and unstructured) can’t mix, and don’t join together. Unstructured data should physically live (and remain) in the file system, while the “results” of mining the unstructured data fit in to the structured world, and have pointers to the unstructured data source. The results of the mining capacities are what drive the pointers to the proper place.

    BPEL is something completely different – it’s a data movement engine, it’s the “ETL” of the business world – it describes business logic and business process workflows much differently than the traditional ETL engine. It isn’t a Data Warehouse, and it doesn’t store data over time.

    Hope this helps,
    Dan L

Leave a Reply