11/23/16

ETL Interview Questions

What is ETL?
Extract – Extracting the data from source system
Transform – Transforming or modifying the data into format what business required
Load – Loading into target database
Explain about your current project?
Here explain about your current project with clear ETL process, practice before getting into an interview room.
Start with what is the objective of the project whether building a data mart or warehouse or ODS.
Then, explain how many source systems are there and do you have staging environment or not.
Then, tell about the ETL tool you are using. Stop it here.
What are the transformation types?
Active Transformation
The output record count of the transformation may or may not equal to input record count.
For example, when we apply filter transformation for age column with the condition of age between 25 and 40. In this case, the data will come out which satisfies this condition, hence the outcome count cannot be predicted.
Passive Transformation
The output record count of the transformation is equal to input record count.
For example, when we apply expression transformation to concatenate first name and last name columns, in this case, the data will come out even though the columns do not have values.
Connected Transformation
A transformation which is being linked with other transformation or target component is called connected.
Unconnected Transformation
A transformation which is not being linked with any other transformation or target component is called unconnected.
What are the types of load?
Full Load (Initial Load or Bulk Load or Fresh Load) –
The data loading process when we do it at very first time. It can be referred as Bulk load or Fresh load.
The job extracts entire volume of data from source table or file and loading into truncated target table after applying transformations logics
Incremental Load (Refresh Load or Daily Load or Change Data Capture) –
The modified data alone will be updated in target followed by full load. The changes will be captured by comparing created or modified date against last run date of the job.
The modified data alone extracted from the source, the job will look for changes  in the source table against job run table, if change exists then data will be extracted and that data alone will be updated in the target without impacting the existing data.
Name some ETL tools
Informatica Power center
Talend Open Studio
IBM Datastage
SQL Server Integration Services (SSIS)
Ab-initio
Oracle Data Integrator
SAS – Data Integration Studio
SAP – Business Object Integrator
Clover ETL
Pentaho Data Integration

Explain the scenarios for testing source to a staging table.
– Verify the table structure of staging table (columns, data type, length, constraints, index)
-Verify the successful workflow (ETL job) run
-Verify the data count between source and staging table
-Verify the data comparison between source table and staging table
-Verify the duplicate data checking, duplicate data should not be loaded into staging table
-Verify the excess trailing space trimmed for all Varchar data type columns
-Verify the job consistency by performing subsequent run
-Verify the job failure runs behavior
-Verify the job re-run success scenario after failure correction
-Verify the job run with bad data (NULL values, exceeding precisions, lookup or reference data not exists)
-Verify the job performance timing
How do you ensure that all source table data’s are loaded into target table?
-Using SET operator MINUS – if both source and target tables are in the same database server.
-Using macro – both source table and target table data will be copied into an excel and compared with macro
-Using Automation tools – tool will fetch data and compares internally with own algorithm
-Using utility tools – develop an automation utility tool using Java or any scripting language along with database drivers
Give an example for Low severity and High priority defect.
-There is a requirement where email notification needs to be triggered in case of job failure
-There is a deviation found during testing that the email notification has been received but the number of records count in the content is not matching
Low severity – since it does not affect any functionality
High priority – since the wrong data count shows the wrong picture to the management team
What are the components of Informatica?
One of the major tool in worldwide. Majorly this tool is using for ETL, data masking, and data quality.
It has four major components,
  1. Repository manager – to add repository and managing folders
  2. Designer – creating mappings
  3. Workflow manager – creating workflow with task and mappings
  4. Workflow monitor – workflow run status tracker
What are tasks available in Informatica?
The below are major tasks available in Informatica power center tool.
1.Session
2.Email
3.Command
4.Control
5.Decision
6.Timer
Database testing vs ETL testing
ETL Testing – Making sure that the data from source to target is being loaded properly or not along with the business transformation rules.
Database Testing – Testing whether the data is being stored properly in the database when we do some operations from the front end or back end along with testing of procedures, functions, and triggers. Testing whether the data is being retrieved properly in UI.
What is partitioning?
Portioning is a concept of running with parallel threads by distributing records. It will be used when the volume of data is huge which directly impact the data load and other transformation progress.
Database and ETL tools are offering this partition concept to improvise the job execution time for high volume data tables.
Below are the types of partitioning available in Informatica power center tool,
1.Pass Through
2.Database partitioning
3.Key range
4.Round robin
5.Hash Auto key
6.Has user key
What are the responsibilities of an ETL tester?
  • Understanding Requirement
  • Estimating
  • Planning
  • Test case preparation
  • Test execution
  • Giving Sign off
What does a mapping document contain?
A mapping document contains,
  1. Columns mapping between source and target
  2. Data type and length for all columns of source and target
  3. Transformation logic for each column
  4. ETL job or workflow information
  5. Input parameter file information
What kind of defects can expect?
  • Table structure issue
  • Index unable to drop issue
  • Index is not created after job run
  • Data issue in source table
  • Data count mismatch between source and target
  • Data not matching between source and target
  • Duplicate data loaded issue
  • Trim and NULL issue
  • Data precision issue
  • Date format issue
  • Business transformation rules issue
  • Subsequent job run not working properly
  • Running job with bad data does not kick off the bad data’s properly
  • Rollback is not happening in case of job failure
  • Performance issue
  • Log file and content issue
  • Mail notification and content issue
1000 records are in the source table, but only 900 records are loaded into the target table. How do you find the missing 100 records?
-Using SET operator MINUS – if both source and target tables are in the same database server.
-Using Excel macro – both source table and target table data will be copied into an excel and compared with macro
-Using Automation tool – tool will fetch data and compares internally with own algorithm
-Using Utility tool – develop an automation utility tool using Java or any scripting language along with database drivers
Can you give few test cases to test the incremental load table?
Insert few records and validate the data after job run
Update non-primary column values and validate the data after job run
Update primary column values and  validate the data after job run
Delete few records and validate the data after job run
Insert/update few records to create duplicate entries and validate the data after job run
Update with bad data – NULL values, blank spaces, lookup data missing
How do you compare a flat file and database table?
-Manual Sampling method – manually compared in sampling basis
-Using Excel macro – flat file data and target table data will be copied into an excel and compared with macro
-Using Automation tool – tool will fetch data and compares internally with own algorithm
-Using Utility tool – develop an automation utility tool using Java or any scripting language along with database drivers