Aggregation

Aggregation combines examples with the intention of reducing their number and uses aggregation rules to combine these attributes and make single new attributes. For example, the data shown in the previous screenshot has multiple entries for the same ID. If these were the transactions in a day, aggregation would be used to combine them to get a daily view. The following screenshot represents a picture of what is required:

Aggregation

The id attribute is used as the column to group examples by (similar to the idea of grouping in SQL). For each example with the same ID, the sum of the fruit and vegetable attributes is calculated to create a new attribute in one example of the final example set.

The operator that can carry out this aggregation is Aggregate. The parameters for this operator to implement the required aggregation are shown in the following two screenshots. The first is shown in the following screenshot of the group by dialog box and handles the grouping of examples. Here, the selection of the id attribute will group all examples with the same ID together.

Aggregation

The aggregation within a group is then controlled by the aggregation attributes dialog box for this operator. This is shown in the following screenshot:

Aggregation

The values for aggregation attribute and aggregation functions are chosen. Because of this, all examples in the group have the function applied to the attribute; the final result will be stored in a new attribute. The name of the new attribute is derived from the aggregation attribute and the aggregation function. In this case, sum is used so the attribute will be called sum(fruit) and sum(vegetable).

An important general point about aggregation is that missing values are often present in the data to be aggregated. The parameters of the operator allow these missing values to be handled.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset