Incremental aggregation

This concept is related to Aggregator transformation. When you have data, which in increasing but the existing data remains constant, you can utilize the incremental aggregation functionality to achieve the output faster and enhance the performance. When you select the Incremental Aggregation option in the Session properties, Informatica saves the result of the last run in the cache and replaces the value in the next run and hence enhances the performance. To understand this concept, let's look at an example.

Consider that you have a file containing the salary of employees, and you wish to get the sum of the salaries of all the employees. Then, consider that we have three employees in the month of JAN, six employees in the month of FEB, and nine employees in the month of MARCH, as shown in the following screenshot:

Incremental aggregation

As you can see, the data in the file is increasing—the first file has the data of employees present in the month of JAN, the second file has the data of employees in the month of FEB, and the third file has the data for MARCH. To get the sum of the salaries of all the employees, we will use the Aggregator transformation. As the number of records is increasing, the time taken for the calculation will also increase. Also, note that the previous data does not change; only the new data is added to the file. To save time, we use the concept of incremental aggregation. This option is present in the session task, as shown in the screenshot of the Properties tab in the preceding Tabs of Session task section.

When you run the file for the month of JAN, the Aggregator transformation will calculate the value of three records and give the corresponding output, which is 6000 in our case. When you do not check the Incremental Aggregation option, Informatica again calculates the six records in the file for the month of FEB and gives you the result, which is 9000 in our case. If you use the Incremental Aggregation option, the aggregator cache will save the value of the last run, which is 6000. When you run the same process for the month of FEB, Informatica replaces the first three records of the file with the value stored in the cache and adds new records to get the result. This results in faster calculation, as the number of records to be calculated reduces.

The basic criterion in order to use incremental aggregation is that the data from the previous run should remain the same.

If the records from the previous run change, the result will be incorrect, as Informatica will not consider the changed value and will replace that value with the value stored in the cache. To handle this, make sure that you check the Reinitialize aggregate cache box. When you check this option, Informatica reinitializes the aggregate cache value and stores the new value. It is important to note that you need to uncheck the Reinitialize aggregate cache option if your data is not changing, otherwise it will always keep on reinitializing the cache, which will indirectly hamper the performance.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset