What is a Junk dimension?
When a dimension has only Boolean columns except the primary or surrogate key is called junk dimension. It can be derived from dimension or fact table.
Consider a dimension table contains more than one Boolean column. All Boolean columns will be grouped into a single table along with the surrogate key.
Structure of Junk dimension:
Typically it has a surrogate key and Boolean columns.
Surrogate key
Boolean column1
. .
. .
Boolean columnN
Steps to form Junk dimension:
- Identify the Boolean attributes in a fact or dimension table
- Create new dimension table for above-identified attribute column along with surrogate key
- Remove the columns from fact or dimension table
- Create a reference key in the fact table for this new dimension
- Develop ETL jobs to load
Advantage of this Junk dimension:
The main advantage of this dimension is the data memory consumption is very less because of storing unique Boolean combinations.
Example:
Consider a schema contains a dimension called “Product”. It has two Boolean columns “Is_veg” and “Is_domestic”.
Based on junk dimension concept we can remove these two Boolean columns from the product dimension and create a new dimension with these two Boolean columns. Name the table as Flag dimension.
Create a surrogate key to keeping it for referential integrity in Flag dimension.
Create a reference key in Product dimension or Amount fact table. If the schema type is star schema, then the mapping would be
For snowflake schema type, connect the flag dimension with product dimension by foreign key reference.