Using pivoting

The best way to understand how pivoting works is by looking at an example. Refer to the process pivoting.xml provided with this book. This is used to produce the result shown in the following screenshot. It displays an example set containing name-value pairs for a specific id, which in this case could be regarded as an identifier:

Using pivoting

The pivot result shows how new attributes have been created for each ID based on the content of the name attribute in the example set input. The number of examples in the result is equal to the number of unique identifiers in the id column (in this case, three), and the number of new attributes created is equal to the number of unique values in the name column (in this case, it is four). The names of the attributes in the result are derived from joining the name of the value column and the values in the name column. Clearly, there can be empty values shown by question marks and these must be handled. In the case earlier, they would probably be set to 0 as a step following the pivot operation.

The parameters required for the Pivot operator to create this result are shown in the following screenshot:

Using pivoting

As can be seen in the screenshot, these parameters are very simple. The id attribute is used as the value for group attribute to group examples together, and the index attribute is used to create new attributes whose names are based on the value of the attribute within the group. For example, in the pivoted data, the attribute value_carrot for id 3 has the value 1. This means that in the original data, there is a row where id is 3, name is carrot, and value is 1.

If other attributes are present in the example set before pivoting, they are also included. This can make the result complex, so it is worth considering the elimination of extra attributes by using Select Attributes or by setting their roles to be special using the Set Role operator.

Tip

If there are duplicate entries in the input example set, the pivot operation will take the last one it encounters and ignore the rest up to that point. To handle this, some aggregation must be done first to create a unique ID for the Pivot operator to handle.

The names produced by the Pivot operator can get quite complicated and you may find yourself having to rename them using the various Rename operators.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset