Data Warehousing Concepts

Data Warehouse
1) A data warehouse is a relational database that is designed for query and analysis rather than for transaction processing.
2) A single, complete and consistent store of data obtained from a variety of different sources made available to end users in a format that they can understand and use in a business context.
3) A technique for assembling and managing data from various sources for the purpose of answering business questions, thus making decisions that were previously not possible.
4) It contains historical data derived from transaction data.
5) It can include data from other sources.
6) It is a consistent store of data obtained from a variety of different sources.
7) It made available to end users in a way they can understand and use in a business context.

Why Data Warehouse and It’s testing?

Business Mandate: There are some major mergers and acquisition taking place and huge amount of data migration takes place
Testing is to ensure that data transformation and data movement is correct.

Basic Elements of the Data Warehouse
Characteristics of Data ware House
Subject Oriented: Data warehouses are designed to help you analyze data.
Integrated: Data warehouses must put data from disparate sources into a consistent format.
They must resolve such problems as naming conflicts and inconsistencies among units of measure.
Nonvolatile:  It means the data once entered into the data warehouse then it should not be changed.
Time Variant: It means a data warehouse's focus on change over time.

Goals of a Data Warehouse
It must make an organization’s information more accessible
It must make the organization’s information consistent
It must be adaptive and resilient to change
It must serve as a foundation for improved decision making

Data warehouse Advantages
To provide a consistent common source of information for various cross organizational and functional activity
To Store Large Volumes of Historical Detail Data
Improve the Ability to Access, Report Against, and Analyze Information
To solve or improve upon Business Processes

Why do organizations need a data warehouse?

Ad-hoc Reporting and Analysis

Dynamic presentation through dashboards

Predictive Analysis 
 In business, predictive models utilize the patterns found in historical and transactional data to identify risks and opportunities.
 Guiding in decision making for business expansion, business strategy etc.

Predictive analytics is majorly used in

 1)Financial services
4) Travel
6)Pharmaceuticals etc.

Different Schemas
Data Warehouse environment usually transforms the relational data model into some special architectures called schema.

There are many schema models designed for data warehousing but the most commonly used are:

1) Star schema
2) Snowflake schema

There are 2 types of tables in data warehousing

1) Fact Table
a Fact Table consists of the measurements, metrics or facts of a business process

2) Dimension Table
Dimension tables contain attributes that describe fact records in the fact table.

Warehouse Database - Types

Relational (ROLAP)

Central Warehouse is usually relational because of potentially large size of Data warehouse

Multidimensional (MOLAP)

Faster response to analytical queries and OLAP computations but they have size limitations

Hybrid Architecture (HOLAP)

Uses relational component to support large databases and multidimensional component for fast response to analytical queries

ETL Testing / Data Warehouse Testing Overview

ETL or Data warehouse testing is categorized into four different engagements irrespective of technology or ETL tools used:

• New Data Warehouse Testing – New DW is built and verified from scratch. Data input is taken from customer requirements and different data sources and new data warehouse is build and verified with the help of ETL tools.
• Migration Testing – In this type of project customer will have an existing DW and ETL performing the job but they are looking to bag new tool in order to improve efficiency.
• Change Request – In this type of project new data is added from different sources to an existing DW. Also, there might be a condition where customer needs to change their existing business rule or they might integrate the new rule.
• Report Testing – Report are the end result of any Data Warehouse and the basic propose for which DW is build. Report must be tested by validating layout, data in the report and calculation.

ETL Testing Techniques:

 1) Verify that data is transformed correctly according to various business requirements and rules.
 2) Make sure that all projected data is loaded into the data warehouse without any data loss and truncation.
 3) Make sure that ETL application appropriately rejects, replaces with default values and reports invalid data.
 4) Make sure that data is loaded in data warehouse within prescribed and expected time frames to confirm improved performance and scalability.