What is ETL?


What is ETL?

ETL stands for Extract-Transform-Load and it is a process of how data is loaded from the source system to the data warehouse.  Data is extracted from an OLTP database, transformed to match the data warehouse schema and loaded into the data warehouse database.  Many data warehouses also incorporate data from non-OLTP systems such as text files, legacy systems and spreadsheets. 
Let see how it works
For example, there is a retail store which has different departments like sales, marketing, logistics etc.  Each of them is handling the customer information independently, and the way they store that data is quite different. The sales department have stored it by customer’s name, while marketing department by customer id.
Now if they want to check the history of the customer and want to know what the different products he/she bought owing to different marketing campaigns; it would be very tedious.
The solution is to use a Datawarehouse to store information from different sources in a uniform structure using ETL. ETL can transform dissimilar data sets into aunifiedstructure.Later use BI tools to derive meaningful insights and reports from this data. 
The following diagram gives you the ROAD MAP of the ETL process
  1. Extract
  •  Extract relevant data
  1. Transform
  •  Transform data to DW (Data Warehouse) format
  • Build keys  - A key is one or more data attributes that uniquely  identify an entity. Various types of keys are primary key, alternate key, foreign key, composite key, surrogate key. The datawarehouse owns these keys and never allows any other entity to assign them.
  •  Cleansing of data :After the data is extracted, it will move into the next phase, of cleaning and conforming of data. Cleaning does the omission in the data as well as identifying and fixing the errors.  Conforming means resolving the conflicts between those data’s that is incompatible, so that they can be used in an enterprise data warehouse. In addition to these, this system creates meta-data that is used to diagnose source system problems and improves data quality.
  1. Load
  •  Load data into DW ( Data Warehouse)
  • Build aggregates - Creating an aggregate is summarizing and storing data which is available in fact table in order to improve the performance of end-user queries.

Followers