5/5/17

Slowly Changing Dimension Types-SCD Type1 Type2 Type3

What is a slowly changing dimension (SCD)?

It is a dimension where the attribute values are getting changed slowly over a certain period of time.

Ways of handling slowly changing dimension:

SCD-Slowly changing dimension SCD-example scenario
There are 3 major ways are available to handle the data load process for an SCD type dimension when any modification happens in the source system.
1. SCD Type 1
-Modifications will be done on the same record
-Here no history of changes will be maintained
2. SCD Type 2
-An existing record will be marked as expired with is_active flag or Expired_date column
-This type allows tracking the history of changes
3. SCD Type 3
-A new value will be tracked as a column
-Here no history of changes will be maintained

Example scenario:

The below employee table stores the city information about where the employee is currently working/living. The employee location will change in slow pace only.
SCD-Slowly changing dimension SCD-example scenario
SCD-Slowly changing dimension SCD-example scenario
Employee 101 is moving Bangalore to Chennai.

Update:

After a certain period of time John is moving to Delhi. To track this change in the dimension table, we have below options.
Type 1 – Update the record same
SCD-Slowly changing dimension SCD-types-1
SCD-Slowly changing dimension SCD-types-1
Type 2 – New record will be inserted with is_active=1, old record is_active=0
SCD-Slowly changing dimension SCD-types-2
SCD-Slowly changing dimension SCD-types-2
Type 2 – New record will be inserted with Expired_date=NULL, old record Expired_date= SYSDATE or SYSDATE-1
Type 3 – New column will be added and tracked the both old and new value
SCD-Slowly changing dimension SCD-types-3
SCD-Slowly changing dimension SCD-types-3

Delete scenario:

After a certain period of time, John is resigned from the company. To track this change in the dimension table, we have below options.
Type 1 – Record will be deleted
Type 2 – The record will be expired by setting is_active=0
Type 2 – The record will be expired by setting expired_date= SYSDATE or SYSDATE-1
Type 3 – All the values will be set to NULL

Advantages and Disadvantages of each SCD Type:

SCD Type1
Advantages -> No additional memory is required
Disadvantage -> We cannot trace back to the history of modifications
SCD Type2
Advantages -> You can trace back to the history
Disadvantages -> The memory is getting consumed since keeping old records
SCD Type3
Advantages -> You can trace back to the history
Disadvantages -> The memory is getting consumed since adding new columns. We cannot add columns to keep every time changes, there is a limitation in number of changes keeping

Key Factors to choose SCD type
SCD-Slowly changing dimension SCD-example scenario

Factor
SCD Type
I don’t want to keep the old records
Type1
I want to keep old records but not ready of adding additional columns
Type2
I want to keep old records irrespective of number of changes
Type3
I am more concern about memory size
Type1
I am not worrying about memory size
Type2/Type3

1 comment:

  1. Really nice post. Thanks for sharing with us..

    ReplyDelete