Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Converting tree labels

As you've seen in previous recipes, parse trees often have a variety of Tree label types that are not present in chunk trees. If you want to use parse trees to train a chunker, then you'll probably want to reduce this variety by converting some of these tree labels to more common label types.

Getting ready

First, we have to decide which Tree labels need to be converted. Let's take a look at that first Tree again:

Immediately, you can see that there are two alternative NP subtrees: NP-SBJ and NP-TMP. Let's convert both of those to NP. The mapping will be as follows:

Original Label	New Label
NP-SBJ	NP
NP-TMP	NP

How to do it...

In transforms.py is the function convert_tree_labels(). It takes two arguments: the Tree to convert and a label conversion mapping. It returns a new Tree with all matching labels replaced based on the values in the mapping:

from nltk.tree import Tree

def convert_tree_labels(tree, mapping):
  children = []

  for t in tree:
    if isinstance(t, Tree):
      children.append(convert_tree_labels(t, mapping))
    else:
      children.append(t)

  label = mapping.get(tree.label(), tree.label())
  return Tree(label, children)

Using the mapping table we saw earlier, we can pass it in as a dict to convert_tree_labels() and convert the first parsed sentence from treebank:

>>> from transforms import convert_tree_labels
>>> mapping = {'NP-SBJ': 'NP', 'NP-TMP': 'NP'}
>>> convert_tree_labels(treebank.parsed_sents()[0], mapping)
Tree('S', [Tree('NP', [Tree('NP', [Tree('NNP', ['Pierre']), Tree('NNP', ['Vinken'])]), Tree(',', [',']), Tree('ADJP', [Tree('NP', [Tree('CD', ['61']), Tree('NNS', ['years'])]), Tree('JJ', ['old'])]), Tree(',', [','])]), Tree('VP', [Tree('MD', ['will']), Tree('VP', [Tree('VB', ['join']), Tree('NP', [Tree('DT', ['the']), Tree('NN', ['board'])]), Tree('PP-CLR', [Tree('IN', ['as']), Tree('NP', [Tree('DT', ['a']), Tree('JJ', ['nonexecutive']), Tree('NN', ['director'])])]), Tree('NP', [Tree('NNP', ['Nov.']), Tree('CD', ['29'])])])]), Tree('.', ['.'])])

As you can see in the following diagram, the NP-* subtrees have been replaced with NP subtrees:

How it works...

The convert_tree_labels() function recursively converts every child subtree using the mapping. The Tree is then rebuilt with the converted labels and children until the entire Tree has been converted.

The result is a brand new Tree instance with new subtrees whose labels have been converted.

Table of Contents for
Converting tree labels

Converting tree labels

Getting ready

How to do it...

How it works...

See also

Table of Contents for Converting tree labels

Create new playlist

Sign In

Sign Up

Converting tree labels

Getting ready

How to do it...

How it works...

See also

Table of Contents for
Converting tree labels