Partitioning in Informatica


Partitioning is a concept of creating parallel threads and processing the data distribution technique. It will be used when the volume of data is huge which directly impact the data load and other transformation progress.
Database and ETL tools are offering this partition concept to improvise the job execution time for high volume data tables.
Below are the types of partitioning available in Informatica power center tool,
  1. Pass Through
    The pipeline will be created for data load and data count would be mapped to each pipeline. The data count once assigned, the data will be processed through that pipeline only.
    We can use this type when we don’t want to increase the number of partitioning.
  2. Database partitioning
    ETL tool will read the portioning applied at the database level and will apply the same logic to distribute the data over partitions.
    We can use this type when proper partitioning applied at the database level.
  3. Key range
    The start range and end range of data to be mentioned, based on data range data will be distributed.
    We can use this key range when we are sure about data range and it won’t change in future. Also, the same key range partitioning is applied in the target database.
    For example, age between 0-25, 26-50, 51-75 and 75-125.
  4. Round robin
    The tool itself will split and distribute the data evenly to partitions based on the number of records.
    We can use this round robin when we are not sure about data volume and do not require to group the data.
  5. Hash Auto key
    The data will be distributed based on group by column. Need to make sure that before applying partition the data has been grouped properly.
    We can use this auto key when data grouping is in place before partitioning step.
  6. Has user key
    The group by column needs to be specified manually and tool will distribute data based on group by column.
    We can use this user key when we are data grouping.

Followers