What is a slowly changing dimension (SCD)?
It is a dimension where the attribute values are getting changed slowly over a certain period of time.
Ways of handling slowly changing dimension:
There are 3 major ways are available to handle the data load process for an SCD type dimension when any modification happens in the source system.
1. SCD Type 1
-Modifications will be done on the same record
-Here no history of changes will be maintained
2. SCD Type 2
-An existing record will be marked as expired with is_active flag or Expired_date column
-This type allows tracking the history of changes
3. SCD Type 3
-A new value will be tracked as a column
-Here no history of changes will be maintained
Example scenario:
The below employee table stores the city information about where the employee is currently working/living. The employee location will change in slow pace only.
Employee 101 is moving Bangalore to Chennai.
Update:
After a certain period of time John is moving to Delhi. To track this change in the dimension table, we have below options.
Type 1 – Update the record same
Type 2 – New record will be inserted with is_active=1, old record is_active=0
Type 2 – New record will be inserted with Expired_date=NULL, old record Expired_date= SYSDATE or SYSDATE-1
Type 3 – New column will be added and tracked the both old and new value
Delete scenario:
After a certain period of time, John is resigned from the company. To track this change in the dimension table, we have below options.
Type 1 – Record will be deleted
Type 2 – The record will be expired by setting is_active=0
Type 2 – The record will be expired by setting expired_date= SYSDATE or SYSDATE-1
Type 3 – All the values will be set to NULL
Advantages and Disadvantages of each SCD Type:
SCD Type1
Advantages -> No additional memory is required
Disadvantage -> We cannot trace back to the history of modifications
SCD Type2
Advantages -> You can trace back to the history
Disadvantages -> The memory is getting consumed since keeping old records
SCD Type3
Advantages -> You can trace back to the history
Disadvantages -> The memory is getting consumed since adding new columns. We cannot add columns to keep every time changes, there is a limitation in number of changes keeping
Key Factors to choose SCD type
Factor |
SCD Type
|
I don’t want to keep the old records |
Type1
|
I want to keep old records but not ready of adding additional columns |
Type2
|
I want to keep old records irrespective of number of changes |
Type3
|
I am more concern about memory size |
Type1
|
I am not worrying about memory size |
Type2/Type3
|