Renaming attributes

It is often the case that many operators generate new attributes and the names are usually self-evident to help explain what the attribute contains. Some created names, however, can contain characters such as ()=-. These names cannot be handled by the Generate Attributes operator because they are interpreted as mathematical operators or operations. In this case, it is necessary to rename the attributes to more benign names and the Rename by Generic Names operator can be used in this case. This operator simply renames attributes to the form of att1, att2, att3, and so on. Once this is done, the Generate Attributes operator can be used in the normal way using the newly generated attribute names.

Renaming can also be done using the simpler Rename and the more powerful Rename by Replacing operators. These operators allow more control to be exerted over renaming and this makes it easier to rename the attributes back to the original names, which are often needed to help explain the attributes to others.

Also, of course, renaming can be used simply to make attributes' names more meaningful and the data easier to understand.

Searching and replacing attribute values

Perhaps it is bad planning, but I often find that I have to make global searches and replacements for attribute values. For example, if a spelling mistake is discovered that is systematically present in all data that prevents it from being matched to data from other sources, it is necessary to globally correct the error. Note this is different from renaming attribute names. Search and replace is about changing values of the attributes throughout the example set.

There are a number of operators that can help including Map, Replace, and Replace (Dictionary). Which operator to use depends on how complex the replacement is and how many replacements have to be made.

Using the Map operator

The Map operator is the simplest and is best used to replace whole nominal values with alternatives. For example, if nominal attributes contain color and must be replaced completely with colour, the Map operator is ideal.

Using the Replace operator

The Replace operator is used to change parts of an attribute's value. For example, if an attribute holds the color green and needs to be replaced with the colour green, use the following regular expression to match attribute values: (.*)color(.*)

Use $1colour$2 to dictate the replacement.

In this, $1 and $2 are numbered capturing groups that correspond to the parts of the nominal either side of the matched item to be replaced.

More complex regular expressions are possible. There is a vast number of resources available to assist with the art of regular expressions and some are given in Chapter 10, Debugging.

Using Replace (Dictionary)

The Replace (Dictionary) operator uses word value pairs in one example set (the data dictionary) to replace words in another. This operator allows regular expressions in the data dictionary, which allows a great deal of flexibility. One good feature of this operator is that by default, all occurrences of the word to replace are found and replaced. So for example, the nominal the color green is one of the colors of the rainbow can be changed to the colour green is one of the colours of the rainbow, simply by using a data dictionary mapping color to colour. All occurrences of the target are changed.

The next screenshot shows the example set before the replace operation:

Using Replace (Dictionary)

The result of the replace operation is shown in the next screenshot. Each occurrence of the word color is replaced with the British English spelling colour.

Using Replace (Dictionary)

To do this, the data dictionary example set is needed, containing attributes to dictate the from and to replacement to be applied. This is shown in the following screenshot:

Using Replace (Dictionary)

Finally, the Replace (Dictionary) operator requires parameters to be set up correctly. These are illustrated in the following screenshot:

Using Replace (Dictionary)

A mappingAndReplacing.xml process is available with the files that accompany this book. This allows the previous examples to be recreated.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset