As we saw in the previous recipe, the transformation process can result in phrases such as recipes book
. This is a NNS
followed by a NN
, when a more proper version of the phrase would be recipe book
, which is a NN
followed by another NN
. We can do another transform to correct these improper plural nouns.
The transforms.py
script defines a function called singularize_plural_noun()
which will depluralize a plural noun (tagged with NNS
) that is followed by another noun:
def singularize_plural_noun(chunk): nnsidx = first_chunk_index(chunk, tag_equals('NNS')) if nnsidx is not None and nnsidx+1 < len(chunk) and chunk[nnsidx+1][1][:2] == 'NN': noun, nnstag = chunk[nnsidx] chunk[nnsidx] = (noun.rstrip('s'), nnstag.rstrip('S')) return chunk
And using it on recipes book
, we get the more correct form, recipe book
.
>>> singularize_plural_noun([('recipes', 'NNS'), ('book', 'NN')]) [('recipe', 'NN'), ('book', 'NN')]
We start by looking for a plural noun with the tag NNS
. If found, and if the next word is a noun (determined by making sure the tag starts with NN
), then we depluralize the plural noun by removing s
from the right side of both the tag and the word. The tag is assumed to be capitalized, so an uppercase S
is removed from the right-hand side of the tag, while a lowercase s
is removed from the right-hand side of the word.