The Rank transformation

The Rank transformation is used to get a specific number of records from the top or bottom. Consider that you need to take the top five salaried employees from the EMPLOYEE table. You can use the Rank transformation and define the property. A sample mapping indicating the Rank transformation is shown in the following screenshot:

The Rank transformation

When you create a Rank transformation, a default RANKINDEX output port comes with the transformation. It is not mandatory to use the RANKINDEX port. We have connected the RANKINDEX port to the target as we wish to give the rank of EMPLOYEES based on their SALARY.

When you use a Rank transformation, you need to define the port on which you wish to rank the data. As shown in the following screenshot, we have ranked the data based on SALARY:

The Rank transformation

You cannot rank the data on multiple ports. Also, you need to define either the Top or Bottom option and the number of records you wish to rank in the Properties tab. In our case, we have selected Top and 5 to implement the scenario, as shown in the following screenshot:

The Rank transformation

Rank transformations accept the data in a row-wise manner and store the data in the cache. Once all the data is received, it checks the data based on the condition and sends the data to the output port.

Rank transformations allow you to get the data based on a particular group. In the next section, we will talk about the group by key present in the Rank transformation.

Group by ranking

Rank transformation also provides a feature to get the data based on a particular group. Consider the scenario discussed previously. We need to get the top five salaried employees from each department. To achieve the functionality, we need to select the group by option, as shown in the following screenshot:

Group by ranking

Next, we will talk about the default port of Rank transformations, rank index.

Rank index

When you create a Rank transformation, a default column called rank index gets created. If required, this port can generate numbers indicating the rank. This is an optional field that you can use if required. If you do not wish to use rank index, you can leave the port unconnected.

Suppose you have the following data belonging to the SALARY column in the source:

Salary
100
1000
500
600
1000
800
900

When you pass the data through a Rank transformation and define a condition to get the top five salaried records, the Rank transformation generates the rank index as indicated here:

Rank_Index, Salary
1,1000
1,1000
3,900
4,800
5,600

As you can see, the rank index assigns 1 rank to the same salary values, and 3 to the next salary. So if you have five records with 1000 as the salary in the source along with other values, and you defined conditions to get the top five salaries, Rank transformation will give all five records with a salary of 1000 and reject all others.

With this, we have learned all the details of Rank transformation.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset