I thought it was about time I waded into the whole ELT & ETL dialogue with some ‘fun facts’, so without further ado…
- both ELT and ETL have been around since the late 1980s/early 1990s (ELT is *not* new)
- historically, ‘black box’ ETL engines (e.g. SSIS, Informatica, Datastage etc.) outnumbered ELT deployments
- ELT attempts to take advantage of the (typically parallel) processing power of the target DW platform to deliver the ‘T’ (Transform) part of the process
- ELT and ETL are effectively the same thing…they both take inputs, optionally make some changes, and apply deltas to the DW target
- both ELT and ELT can be more accurately described as ELTL as they both load data twice
- ‘ETL style’ ELTL receives Extracts from one/more sources, Loads the data to an ETL server, applies the Transforms on the ETL server, then Loads the changes to the DW target
- ‘ELT style’ ELTL receives Extracts from one/more source(s), Loads the data to a staging/landing zone within the DW target, applies the Transforms to a work/transform zone within the DW, then Loads the changes from the work/transform zone to the DW core schema
- the only real difference between ETL and ELT is where/when the ‘T’ takes place
- for ETL the ‘T’ happens via a dedicated external ETL engine prior to the DW load
- for ELT the ‘T’ happens within the DW in between the load/receive and apply phases
- the adoption of scalable cloud-based DW engines that can support the ELT approach is why the ‘ETL v ELT’ debate has become ‘a thing’ recently
For the record, I’m a ‘SQL-only ELT’ kinda guy.
I developed my first Teradata ‘ELT-style’ ETL processes whilst still a graduate trainee in 1989/1990. The same approach has since served me well on Netezza, Greenplum, Redshift and BigQuery.