Swapping verb phrases

Swapping the words around a verb can eliminate the passive voice from particular phrases. For example, the book was great can be transformed into the great book. This kind of normalization can also help with frequency analysis, by counting two apparently different phrases as the same phrase.

How to do it...

In transforms.py is a function called swap_verb_phrase(). It swaps the right-hand side of the chunk with the left-hand side, using the verb as the pivot point. It uses the first_chunk_index() function defined in the previous recipe to find the verb to pivot around.

def swap_verb_phrase(chunk):
  def vbpred(wt):
    word, tag = wt
    return tag != 'VBG' and tag.startswith('VB') and len(tag) > 2

  vbidx = first_chunk_index(chunk, vbpred)

  if vbidx is None:
    return chunk

  return chunk[vbidx+1:] + chunk[:vbidx]

Now we can see how it works on the part-of-speech tagged phrase the book was great:

>>> swap_verb_phrase([('the', 'DT'), ('book', 'NN'), ('was', 'VBD'), ('great', 'JJ')])
[('great', 'JJ'), ('the', 'DT'), ('book', 'NN')]

And the result is great the book. This phrase clearly isn't grammatically correct, so read on to learn how to fix it.

How it works...

Using first_chunk_index() from the previous recipe with the vbpred() function defined inline, we start by finding the first matching verb that is not a gerund (a word that ends in ing) tagged with VBG. Once we've found the verb, we return the chunk with the right side before the left, and remove the verb.

The reason we don't want to pivot around a gerund is that gerunds are commonly used to describe nouns, and pivoting around one would remove that description. Here's an example where you can see how not pivoting around a gerund is a good thing:

>>> swap_verb_phrase([('this', 'DT'), ('gripping', 'VBG'), ('book', 'NN'), ('is', 'VBZ'), ('fantastic', 'JJ')])
[('fantastic', 'JJ'), ('this', 'DT'), ('gripping', 'VBG'), ('book', 'NN')]

If we had pivoted around the gerund, the result would be book is fantastic this, and we'd lose the gerund gripping.

There's more...

Filtering insignificant words makes the final result more readable. By filtering either before or after swap_verb_phrase(), we get fantastic gripping book instead of fantastic this gripping book:

>>> from transforms import swap_verb_phrase, filter_insignificant
>>> swap_verb_phrase(filter_insignificant([('this', 'DT'), ('gripping', 'VBG'), ('book', 'NN'), ('is', 'VBZ'), ('fantastic', 'JJ')]))
[('fantastic', 'JJ'), ('gripping', 'VBG'), ('book', 'NN')]
>>> filter_insignificant(swap_verb_phrase([('this', 'DT'), ('gripping', 'VBG'), ('book', 'NN'), ('is', 'VBZ'), ('fantastic', 'JJ')]))
[('fantastic', 'JJ'), ('gripping', 'VBG'), ('book', 'NN')]

Either way, we get a shorter grammatical chunk with no loss of meaning.

See also

The previous recipe, Correcting verb forms, defines first_chunk_index(), which is used to find the verb in the chunk.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset