Creating a shallow tree

In the previous recipe, we flattened a deep Tree by only keeping the lowest level subtrees. In this recipe, we'll keep only the highest level subtrees instead.

How to do it...

We'll be using the first parsed sentence from the treebank corpus as our example. Recall from the previous recipe that the sentence Tree looks like this:

How to do it...

The shallow_tree() function defined in transforms.py eliminates all the nested subtrees, keeping only the top subtree labels:

from nltk.tree import Tree

def shallow_tree(tree):
  children = []

  for t in tree:
    if t.height() < 3:
      children.extend(t.pos())
    else:
      children.append(Tree(t.label(), t.pos()))

  return Tree(tree.label(), children)

Using it on the first parsed sentence in treebank results in a Tree with only two subtrees:

>>> from transforms import shallow_tree
>>> shallow_tree(treebank.parsed_sents()[0])
Tree('S', [Tree('NP-SBJ', [('Pierre', 'NNP'), ('Vinken', 'NNP'), (',', ','), ('61', 'CD'), ('years', 'NNS'), ('old', 'JJ'), (',', ',')]), Tree('VP', [('will', 'MD'), ('join', 'VB'), ('the', 'DT'), ('board', 'NN'), ('as', 'IN'), ('a', 'DT'), ('nonexecutive', 'JJ'), ('director', 'NN'), ('Nov.', 'NNP'), ('29', 'CD')]), ('.', '.')])

We can visually and programmatically see the difference in the following diagram:

How to do it...
>>> treebank.parsed_sents()[0].height()
7
>>> shallow_tree(treebank.parsed_sents()[0]).height()
3

As in the previous recipe, the height of the new tree is 3 so it can be used for training a chunker.

How it works...

The shallow_tree() function iterates over each of the top-level subtrees in order to create new child trees. If the height() of a subtree is less than 3, then that subtree is replaced by a list of its part-of-speech tagged children. All other subtrees are replaced by a new Tree whose children are the part-of-speech tagged leaves. This eliminates all nested subtrees while retaining the top-level subtrees.

This function is an alternative to flatten_deeptree() from the previous recipe, for when you want to keep the higher-level tree labels and ignore the lower-level labels.

See also

The previous recipe covers how to flatten a Tree and keep the lowest-level subtrees, as opposed to keeping the highest-level subtrees.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset