2/15/14

ETL Process

ETL Process
  • Access data dictionaries defining source files
  • Identify sources of data from existing systems
  • Build logical and physical data models for target data
  • Specify business and technical rules for data extraction, conversion and transformation
  • Perform data extraction and transformation and load target databases


Push vs. Pull

Pull :- A Pull strategy is initiated by the Target System. As a part of the Extraction Process, the source data can be pulled from Transactional system into a staging area by establishing a connection to the relational/flat/ODBC sources.
Advantage :- No additional space required to store the data that needs to be loaded into to the staging database
Disadvantage :- Burden on the Transactional systems when we want to load data into the staging database
Push :- A Push strategy is initiated by the Source System. As a part of the Extraction Process, the source data can be pushed/exported or dumped onto a file location from where it can loaded into a staging area.
Advantage :- No additional burden on the Transactional systems when we want to load data into the staging database
Disadvantage :- Additional space required to store the data that needs to be loaded into to the staging database

ETL Process Flow


Static extract = capturing a snapshot of the source data at a point in time

Incremental extract = capturing changes that have occurred since the last static extract




Fixing errors: misspellings, erroneous dates, incorrect field usage, mismatched addresses, missing data, duplicate data, inconsistencies

Also: decoding, reformatting, time stamping, conversion, key generation, merging, error detection/logging, locating missing data



Record-level:
  • Selection – data partitioning
  • Joining – data combining
  • Aggregation – data summarization
Field-level: 
  • Single-field – from one field to one field
  • Multi-field – from many fields to one, or one field to many



Refresh mode: bulk rewriting of target data at periodic intervals
Update mode: only changes in source data are written to data warehouse



0 comments:

Post a Comment