if there is such a thing, it would only be futuristic until the features are available and surpassed. anyhow, i thought i’d take a minute (just for fun) to spell out what i think would be the ultimate data warehousing/bi system. it’s time to get your wish-list out, and add to this set of requirements – this is a brain-storming session, and i hope you will take the time to comment.
attributes of the futuristic data warehouse:
- it will be 100% real-time feeds, no more batch feeds to deal with
- it will be 100% web-service driven (*for inflow and alerts). forget etl/elt and data integration movement tools, all data will arrive via message queues. there will be no tools between the database/data warehouse and the message queue, in other words the queues will deliver direct to the database system.
- it will have real-time messaging alerts
- it will allow xml through xsd feeds, and will warehouse the xsd structural definitions.
- it will offer mpp columnar based database
- it will have ram based data (in-ram parts of the database for hot data sets).
- it will have olap cubes built in to the database engine
- it will have virtual data marts (make them up as you go along – right along side of pre-defined data marts), but all data marts will be virtual. the queries will run faster as they get used more often – data sets will continue to be cached, discarded, and re-built based on usage. in other words…
- it will have dynamic structured virtual data marts (no more “copies” of the data sets needed)
- it will have cell level (row & column & role & user) based data privacy/masking and protection.
- it will allow logical data models to be defined on top of the data set, it will no longer need physical data model definitions
- it will have time-series / temporality built in to it’s engine, it will automatically mark “current” data sets.
- it will have automatic “delta” checking on inserts, you will no longer have to rely on logic to do this
- it will have automatic “de-duplication”, it will no longer maintain duplicate records
- it will have automatic sequencing of records
- it will have “lazy indexing”, in other words – massive numbers of inserts will take much of the compute power, leaving the indexing to “catch up” when the inserts are done.
- it will automatically take care of statistics (not needing you to manage and maintain the system)
- it can handle batches, if necessary – but it shouldn’t be necessary.
- it will have “row level locking” in the database against a logical grouping of records. in other words, in one logical model you represent the “data warehouse with history”, in a second logical model (overlayed on the data set), you represent the oltp or transactional system. because the data sets are stored in a columnar fashion, this means that indexing systems make the difference in using the data in the database (depending on the model or mode of access).
these are just some of the things i hope the future database engines have. how many features do you think you would use? what other feature sets do you see as critical?
let me know,