The best way to understand how pivoting works is by looking at an example. Refer to the process pivoting.xml
provided with this book. This is used to produce the result shown in the following screenshot. It displays an example set containing name-value pairs for a specific id
, which in this case could be regarded as an identifier:
The pivot result shows how new attributes have been created for each ID based on the content of the name attribute in the example set input. The number of examples in the result is equal to the number of unique identifiers in the id column (in this case, three), and the number of new attributes created is equal to the number of unique values in the name column (in this case, it is four). The names of the attributes in the result are derived from joining the name of the value column and the values in the name column. Clearly, there can be empty values shown by question marks and these must be handled. In the case earlier, they would probably be set to 0 as a step following the pivot operation.
The parameters required for the Pivot
operator to create this result are shown in the following screenshot:
As can be seen in the screenshot, these parameters are very simple. The id attribute is used as the value for group attribute to group examples together, and the index attribute is used to create new attributes whose names are based on the value of the attribute within the group. For example, in the pivoted data, the attribute value_carrot for id 3 has the value 1. This means that in the original data, there is a row where id is 3, name is carrot, and value is 1.
If other attributes are present in the example set before pivoting, they are also included. This can make the result complex, so it is worth considering the elimination of extra attributes by using Select Attributes
or by setting their roles to be special using the Set Role
operator.
The names produced by the Pivot
operator can get quite complicated and you may find yourself having to rename them using the various Rename
operators.