Responsibilities of an ETL tester

Responsibilities of an ETL tester

Key responsibilities of an ETL tester are segregated into three categories
  • Stage table/ SFS or MFS
  • Business transformation logic applied
  • Target table loading from stage file or table after applying atransformation. 
Some of the responsibilities of an ETL tester are
  • Test ETL software
  • Test components of  ETL datawarehouse
  • Execute backend data-driven test
  • Create, design and execute test cases, test plans and test harness
  • Identify the problem and provide solutions for potential issues
  • Approve requirements and design specifications
  • Data transfers and Test flat file
  • Writing SQL queries3 for various scenarios like count test

ETL Performance Testing and Tuning

Best Practices for ETL Testing

Best Practices for ETL Testing

  1. Make sure data is transformed correctly
  2.  Without any data loss and truncation projected data should be loaded into the data warehouse
  3.  Ensure that ETL application appropriately rejects and replaces with default values and reports invalid data
  4.  Need to ensure that the data loaded in data warehouse within prescribed and expected time frames to confirm scalability and performance
  5.  All methods should have appropriate unit tests regardless of visibility
  6. To measure their effectiveness all unit tests should use appropriate coverage techniques
  7. Strive for one assertion per test case
  8.  Create unit tests that target exceptions

Difference between Database testing and ETL testing

ETL Testing
                          Data Base Testing
Verifies whether data is moved as expected
The primary goal is to check if the data is following the rules/ standards defined in the Data Model
Verifies whether counts in the source and target are matching
Verifies whether the data transformed is as per expectation
Verify that there are no orphan records and foreign-primary key relations are maintained
Verifies that the foreign primary key relations are preserved during the ETL
Verifies that there are no redundant tables and database is optimally normalized
Verifies for duplication in loaded data
Verify if data is missing in columns where required

Types of ETL Bugs

Type of Bugs
User interface bugs/cosmetic bugs
·          Related to GUI of application
·          Font style, font size, colors, alignment, spelling mistakes, navigation and so on
Boundary Value Analysis (BVA) related bug
·         Minimum and maximum values
Equivalence Class Partitioning (ECP) related bug
·          Valid and invalid type
Input/Output bugs
·         Valid values not accepted
·          Invalid values accepted
Calculation bugs
·         Mathematical errors
·         Final output is wrong
Load Condition bugs
·         Does not allows multiple users
·         Does not allows customer expected load
Race Condition bugs
·         System crash & hang
·         System cannot run client platforms
Version control bugs
·         No logo matching
·          No version information available
·         This occurs usually in regression testing
H/W bugs
·         Device is not responding to the application
Help Source bugs
·         Mistakes in help documents

ETL Test Scenarios and Test Cases

Test Scenario
Test Cases

Mapping doc validation
Verify mapping doc whether corresponding ETL information is provided or not.  Change log should maintain in every mapping doc.
1.      Validate the source and target table structure against corresponding mapping doc.
2.      Source data type and target data type should be same
3.      Length of data types in both source and target should be equal
4.      Verify that data field types and formats are specified  
5.      Source data type length should not less than the target data type length
6.      Validate the name of columns in the table against mapping doc.
Constraint Validation
Ensure the constraints are defined for specific table as expected
Data consistency issues

1.      The data type and length for a particular attribute may vary in files or tables though the semantic definition is the same.
2.      Misuse of integrity constraints
Completeness Issues
1.      Ensure that all expected data is loaded into target table.
2.      Compare record counts between source and target.
3.      Check for any rejected records
4.      Check data should not be truncated in the column of target tables
5.      Check boundary value analysis
6.      Compares unique values of key fields between data loaded to WH and source data
Correctness Issues
1.      Data that is misspelled or inaccurately recorded
2.      Null, non-unique or out of range data
Data Quality
1.      Number check: Need to number check and validate it
2.      Date Check: They have to follow date format and it should be same across all records
3.      Precision Check
4.      Data check
5.      Null check
Null Validate
Verify the null values, where “Not Null” specified for a specific column.
Duplicate Check
1.      Needs to validate the unique key, primary key and any other column should be unique as per the business requirements are having any duplicate rows
2.      Check if any duplicate values exist in any column which is extracting from multiple columns in source and combining into one column
3.      As per the client requirements, needs to be ensure that no duplicates in combination of multiple columns within target only
Date Validation
Date values are using many areas in ETL development for
1.      To know the row creation date
2.      Identify active records as per the ETL development perspective
3.      Identify active records as per the business requirements perspective
4.      Sometimes based on the date values the updates and inserts are generated.
Complete Data Validation
1.      To validate the complete data set in source and target table minus a query in a best solution
2.      We need to source minus target and target minus source
3.      If minus query returns any value those should be considered as mismatching rows
4.      Needs to matching rows among source and target using intersect statement
5.      The count returned by intersect should match with individual counts of source and target tables
6.      If minus query returns of rows and count intersect is less than source count or target table then we can consider as duplicate rows are existed.
Data Cleanness
Unnecessary columns should be deleted before loading into the staging area.

How to create ETL Test Case

How to create ETL Test Case

ETL testing is a concept which can be applied to different tools and databases in information management industry.  The objective of ETL testing is to assure that the data that has been loaded from a source to destination after business transformation is accurate.  It also involves the verification of data at various middle stages that are being used between source and destination.

Types of ETL Testing

Types Of Testing
Testing Process

ETL Testing Process

ETL Testing Process

Similar to other Testing Process, ETL also go through different phases. The different phases of ETL testing process is as follows

What is ETL?

What is ETL?

ETL stands for Extract-Transform-Load and it is a process of how data is loaded from the source system to the data warehouse.  Data is extracted from an OLTP database, transformed to match the data warehouse schema and loaded into the data warehouse database.  Many data warehouses also incorporate data from non-OLTP systems such as text files, legacy systems and spreadsheets. 
Let see how it works
For example, there is a retail store which has different departments like sales, marketing, logistics etc.  Each of them is handling the customer information independently, and the way they store that data is quite different. The sales department have stored it by customer’s name, while marketing department by customer id.

What is BI?

What is BI?

Business Intelligence is the process of collecting raw data or business data and turning it into information that is useful and more meaningful.  The raw data is the records of the daily transaction of an organization such as interactions with customers, administration of finance, and management of employee and so on.  These data’s will be used for “Reporting, Analysis, Data mining, Data quality and Interpretation, Predictive Analysis”.

Data Warehousing Concepts

Data Warehouse
1) A data warehouse is a relational database that is designed for query and analysis rather than for transaction processing.
2) A single, complete and consistent store of data obtained from a variety of different sources made available to end users in a format that they can understand and use in a business context.
3) A technique for assembling and managing data from various sources for the purpose of answering business questions, thus making decisions that were previously not possible.
4) It contains historical data derived from transaction data.
5) It can include data from other sources.
6) It is a consistent store of data obtained from a variety of different sources.
7) It made available to end users in a way they can understand and use in a business context.

Why Data Warehouse and It’s testing?

Business Mandate: There are some major mergers and acquisition taking place and huge amount of data migration takes place
Testing is to ensure that data transformation and data movement is correct.

Basic Elements of the Data Warehouse
Characteristics of Data ware House
Subject Oriented: Data warehouses are designed to help you analyze data.
Integrated: Data warehouses must put data from disparate sources into a consistent format.
They must resolve such problems as naming conflicts and inconsistencies among units of measure.
Nonvolatile:  It means the data once entered into the data warehouse then it should not be changed.
Time Variant: It means a data warehouse's focus on change over time.

Goals of a Data Warehouse
It must make an organization’s information more accessible
It must make the organization’s information consistent
It must be adaptive and resilient to change
It must serve as a foundation for improved decision making

Data warehouse Advantages
To provide a consistent common source of information for various cross organizational and functional activity
To Store Large Volumes of Historical Detail Data
Improve the Ability to Access, Report Against, and Analyze Information
To solve or improve upon Business Processes

Why do organizations need a data warehouse?

Ad-hoc Reporting and Analysis

Dynamic presentation through dashboards

Predictive Analysis 
 In business, predictive models utilize the patterns found in historical and transactional data to identify risks and opportunities.
 Guiding in decision making for business expansion, business strategy etc.

Predictive analytics is majorly used in

 1)Financial services
4) Travel
6)Pharmaceuticals etc.

Different Schemas
Data Warehouse environment usually transforms the relational data model into some special architectures called schema.

There are many schema models designed for data warehousing but the most commonly used are:

1) Star schema
2) Snowflake schema

There are 2 types of tables in data warehousing

1) Fact Table
a Fact Table consists of the measurements, metrics or facts of a business process

2) Dimension Table
Dimension tables contain attributes that describe fact records in the fact table.

Warehouse Database - Types

Relational (ROLAP)

Central Warehouse is usually relational because of potentially large size of Data warehouse

Multidimensional (MOLAP)

Faster response to analytical queries and OLAP computations but they have size limitations

Hybrid Architecture (HOLAP)

Uses relational component to support large databases and multidimensional component for fast response to analytical queries

ETL Testing / Data Warehouse Testing Overview

ETL or Data warehouse testing is categorized into four different engagements irrespective of technology or ETL tools used:

• New Data Warehouse Testing – New DW is built and verified from scratch. Data input is taken from customer requirements and different data sources and new data warehouse is build and verified with the help of ETL tools.
• Migration Testing – In this type of project customer will have an existing DW and ETL performing the job but they are looking to bag new tool in order to improve efficiency.
• Change Request – In this type of project new data is added from different sources to an existing DW. Also, there might be a condition where customer needs to change their existing business rule or they might integrate the new rule.
• Report Testing – Report are the end result of any Data Warehouse and the basic propose for which DW is build. Report must be tested by validating layout, data in the report and calculation.

ETL Testing Techniques:

 1) Verify that data is transformed correctly according to various business requirements and rules.
 2) Make sure that all projected data is loaded into the data warehouse without any data loss and truncation.
 3) Make sure that ETL application appropriately rejects, replaces with default values and reports invalid data.
 4) Make sure that data is loaded in data warehouse within prescribed and expected time frames to confirm improved performance and scalability.