2/15/14

ETL Testing

ETL Testing
  • The ETL testing on data warehouse involves the SQL queries in the test case documents. 
  • It is vital to test both the initial loads of the Data Warehouse from the source i.e. when it gets extracted and then updating it on the target table
  • In specific cases, where trouble shooting is required, the intermediate steps needs to be verified as well.
  • So to achieve bug detection, it is very important to set some basic testing rules. They are
    • No Data losses
    • Correct transformation rules
    • Data validation
No Data losses
  • Verify that all expected data gets loaded into the data warehouse. 
  • This includes validating that all records, all fields and the full contents of each field are loaded without any truncation occurs at any step in the process.
  • As and when required negative scenarios are also validated, one of the example is validating special characters etc.
 Correct transformation Rules
  • Ensure that all data is transformed correctly according to business rules, it could be straight move, simple transformation or the complex transformation.
  • The best method could be to pick some sample records, use the “EXACT” formula in the excel worksheet and compare the results to validate data transformations manually. 
  • This should be ideally done once the testers are absolutely clear about the transformation rules in the business specification.
  Data Validation
  • In data warehouse testing, the test execution revolves around the data, so it is important to achieve the degree of excellence for the data and for that we do the data validation for both the data extracted from the source and then getting loaded at the table.
Challenges in ETL Testing
  • Knowing the ERD (Entity Relationship Diagram)
  • Periodic data refreshes to have the production like environment
  • Understating the mappings between source and targets
  • Incompatible and duplicate data.
  • Loss of data during ETL process.
  • Unavailability of inclusive test bed.
  • Testers have no privileges to execute ETL jobs by their own.
  • Volume and complexity of data is very huge.
  • Fault in business process and procedures.
  • Trouble acquiring and building test data.
  • Missing business flow information.
Data Validation
  • Data validation deals with making sure the data is valid (clean, correct and useful). 
  • Data validation procedures use data validation rules (or check routines) to ensure the validity (mostly correctness and meaningfulness) of data. 
  • It also ensures the validity of input data to maintain the security of the system.
  • These rules are automatically implemented through data dictionaries.
  • Data validation can also be implemented through declaring data integrity rules or procedures enforcing business rules (especially in business applications).
  • These business rules are usually captured during the initial business requirements analysis conducted by the business analysts. 
  • It is very important to implement business rules at the beginning of the process, because wrongly validated data usually have a negative impact on the execution of the business process.
  • The simplest form of validation is checking the input to make sure they are made up of characters from the “valid” set.                       
  • For example, a validation process for telephone directory application should validate the input telephone numbers to make sure that they contain only numbers, plus/minus symbols and brackets (and nothing else).  
  • A little more advanced validation processes could also check the country code field to check if they are valid country codes.
Data Verification
  • Data verification is the process of checking a copy of data to make sure that it is exactly equal to the original copy of the data. 
  • Data verification is usually required when you have backed up your data.
  • Most modern backup software have in-built verification functionality
  • Data verification is a very important process as it makes you feel safe because you will be confident that you can actually use the backed up data in case the original data gets lost or corrupted.
  • Verification software usually makes sure the copy is readable as well as the content is matched exactly with the original content. So, it takes more time than a simple backup, but it is well worth the trouble. 
What is the difference between Data Validation and Data Verification?
  • Data validation is usually performed on the original copy or the inputs to the system, while data verification is carried out on the copies (or backups) of data. 
  • Checking the validity of inputs is very quick compared to the lengthy verification processes that occur after backing up.  
  • Validation can be used to protect data from the mistakes made by the users, while verification can be used to protect data from the problems that occur due to system faults.



0 comments:

Post a Comment