Data Science

ETL is Dead! Long Live ELT

recently at the wwdvc conference i made the claim that etl is dead.  i believe this to be the case, but before you object – let me explain exactly the context in which i made this statement.

have you heard of bigdata?

the whole notion of having too much data to work with is a common theme these days, although only the largest customers seem to be actually executing on the vision of bigdata.  why?  because they have asked the right questions – ie: how to get value from big data and “what does it mean to support it with the right infrastructure and processing capabilities”.

the whole notion of nosql and big data is to “move the processing” to where the data lives.  otherwise, it becomes mathematically impossible to scale out.  you see, there is a mathematical (not hypothetical, not theoretical) upper limit to the amount of information that can be pulled out of a data store, pushed through a pipe (transformed), and pushed back to a target.

as the data set grows and grows and grows, it becomes increasingly difficult to handle the volume in the pipeline.

wait, 90% or more organizations still use etl…

correct.  i never said there wasn’t value – but the value isn’t in the process of “etl”, the value is in the metadata that describes the processing to be done.  that said, those engines that can translate their “etl processing” in to “elt” will continue to weather the storm, and will provide scalable transformations in the big data environments.

what i mean is that data lineage is still critical, metadata driven transformation is still critical…  just the processing of transformation within an etl process is dead, hence: etl is dead as an implementation, and if it’s not dead in your organization, it will be when you start working with bigdata or huge volumes.

tools that are *unable* to “push logic” in to the database layers, or generate native map/reduce components will fail under volume in the near future, and those tool vendors will suffer the consequences.

elt from a processing standpoint is the *only* way forward.

if this were not true, then map/reduce would not exist for big data solutions.

hope this helps clear the air,

etl truly is dead.  metadata & lineage are not dead, elt is the future.


dan linstedt

No comments yet.

Leave a Reply