Singularizing plural nouns

As we saw in the previous recipe, the transformation process can result in phrases such as recipes book. This is a NNS followed by a NN, when a more proper version of the phrase would be recipe book, which is a NN followed by another NN. We can do another transform to correct these improper plural nouns.

How to do it...

The transforms.py script defines a function called singularize_plural_noun() which will depluralize a plural noun (tagged with NNS) that is followed by another noun:

def singularize_plural_noun(chunk):
  nnsidx = first_chunk_index(chunk, tag_equals('NNS'))

  if nnsidx is not None and nnsidx+1 < len(chunk) and chunk[nnsidx+1][1][:2] == 'NN':
    noun, nnstag = chunk[nnsidx]
    chunk[nnsidx] = (noun.rstrip('s'), nnstag.rstrip('S'))

  return chunk

And using it on recipes book, we get the more correct form, recipe book.

>>> singularize_plural_noun([('recipes', 'NNS'), ('book', 'NN')])
[('recipe', 'NN'), ('book', 'NN')]

How it works...

We start by looking for a plural noun with the tag NNS. If found, and if the next word is a noun (determined by making sure the tag starts with NN), then we depluralize the plural noun by removing s from the right side of both the tag and the word. The tag is assumed to be capitalized, so an uppercase S is removed from the right-hand side of the tag, while a lowercase s is removed from the right-hand side of the word.

See also

The previous recipe shows how a transformation can result in a plural noun followed by a singular noun, though this could also occur naturally in real-world text.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset