5/5/17

Extract Transform Load

ETL (extract, transform, and load) is the process of extracting data from the source, transforming into required format and loading into the target database.
Extract
Extraction is the process of selecting/fetching data from source table or source file. As we know that the data will be coming from different types of sources (heterogeneous source) to data warehouse database. In that case, the extraction refers all type of source formats.The few source formats include a relational database, flat file, excel file and XML file.

The few source formats include a relational database, flat file, excel file and XML file.This would be very first step in ETL process and very critical for business since the right decision requires right data to be fetched from the right place (source table or file).
This would be very first step in ETL process and very critical for business since the right decision requires right data to be fetched from the right place (source table or file).There are three types of extraction based on business requirements,
There are three types of extraction based on business requirements,
  • Extracting entire table or file data
  • Extracting only the part of data with applying filter condition
  • Extracting data only which have undergone modifications for incremental load
Transform
We cannot consume the transaction data’s as such, there will be manual interventions required to tweak the data for doing analysis and reporting. We need to modify the data based on how we want to see them in analysis view or report. The manual intervention on extracted data is called transformation.
Every business requirement will be called as transformation rule. It plays a vital role in improving the data quality by performing aggregations, filter, data cleaning, surrogate key generation and developing referential integrity.

Load
Loading the transformed form of data into the target database, the target could be operational data store (ODS) or data mart or data warehouse database. This would be the final step in ETL process.
There are two types of load full load and incremental load.
The big challenge in this stage is handling the volume of data, directly it affects the job performance. But the ETL tools are offering a provision to overcome this situation by using partitioning concept.

1 comment:

  1. Well written, and some great points.

    ReplyDelete