Correcting verb forms

It's fairly common to find incorrect verb forms in real-world language. For example, the correct form of is our children learning? is are our children learning? The verb is should only be used with singular nouns, while are is for plural nouns, such as children. We can correct these mistakes by creating verb correction mappings that are used depending on whether there's a plural or singular noun in the chunk.

Getting ready

We first need to define the verb correction mappings in transforms.py. We'll create two mappings, one for plural to singular and another for singular to plural:

plural_verb_forms = {
  ('is', 'VBZ'): ('are', 'VBP'),
  ('was', 'VBD'): ('were', 'VBD')
}

singular_verb_forms = {
  ('are', 'VBP'): ('is', 'VBZ'),
  ('were', 'VBD'): ('was', 'VBD')
}

Each mapping has a tagged verb that maps to another tagged verb. These initial mappings cover the basics of mapping is to are, was to were, and vice versa.

How to do it...

In transforms.py is a function called correct_verbs(). Pass it a chunk with incorrect verb forms and you'll get a corrected chunk back. It uses a helper function, first_chunk_index(), to search the chunk for the position of the first tagged word where pred returns True. The pred argument should be a callable function that takes a (word, tag) tuple and returns True or False. Here's first_chunk_index():

def first_chunk_index(chunk, pred, start=0, step=1):
  l = len(chunk)
  end = l if step > 0 else -1

  for i in range(start, end, step):
    if pred(chunk[i]):
      return i

  return None

For first_chunk_index() to be useful, we need to use a predicate function. In the case of correct_verbs(), the predicate function we need should return True if the tag in the (word, tag) argument starts with a given tag prefix, and False otherwise.

def tag_startswith(prefix):
  def f(wt):
    return wt[1].startswith(prefix)
  return f

The tag_startswith() function takes a tag prefix, such as NN, and returns a predicate function that will take a (word, tag) tuple and return True if the tag starts with the given prefix. A function that returns another function is called a higher order function. This is not as complicated as it might sound—just as you can use a function to generate and return new variables and values, some programming languages (such as Python) let you generate functions inside of other functions. In this case, we want a function that takes a single argument: (word, tag). But we also want this function to have access to a prefix variable. Since we cannot add arguments to the function definition, we instead generate a higher order function that has access to the prefix variable, while preserving the single (word, tag) argument.

Now that we have defined first_chunk_index() and tag_startswith(), we can actually implement correct_verbs(). This may seem like overkill for a single function, but we will be using first_chunk_index() and tag_startswith() in subsequent recipes.

def correct_verbs(chunk):
  vbidx = first_chunk_index(chunk, tag_startswith('VB'))
  # if no verb found, do nothing
  if vbidx is None:
    return chunk

  verb, vbtag = chunk[vbidx]
  nnpred = tag_startswith('NN')
  # find nearest noun to the right of verb
  nnidx = first_chunk_index(chunk, nnpred, start=vbidx+1)
  # if no noun found to right, look to the left
  if nnidx is None:
    nnidx = first_chunk_index(chunk, nnpred, start=vbidx-1, step=-1)
  # if no noun found, do nothing
  if nnidx is None:
    return chunk

  noun, nntag = chunk[nnidx]
  # get correct verb form and insert into chunk
  if nntag.endswith('S'):
    chunk[vbidx] = plural_verb_forms.get((verb, vbtag), (verb, vbtag))
  else:
    chunk[vbidx] = singular_verb_forms.get((verb, vbtag), (verb, vbtag))

  return chunk

When we call the preceding function on a part-of-speech tagged is our children learning chunk, we get back the correct form, are our children learning.

>>> from transforms import correct_verbs
>>> correct_verbs([('is', 'VBZ'), ('our', 'PRP$'), ('children', 'NNS'), ('learning', 'VBG')])
[('are', 'VBP'), ('our', 'PRP$'), ('children', 'NNS'), ('learning', 'VBG')]

We can also try this with a singular noun and an incorrect plural verb:

>>> correct_verbs([('our', 'PRP$'), ('child', 'NN'), ('were', 'VBD'), ('learning', 'VBG')])
[('our', 'PRP$'), ('child', 'NN'), ('was', 'VBD'), ('learning', 'VBG')]

In this case, were becomes was because child is a singular noun.

How it works...

The correct_verbs() function starts by looking for a verb in the chunk. If no verb is found, the chunk is returned with no changes. Once a verb is found, we keep the verb, its tag, and its index in the chunk. Then, we look on either side of the verb to find the nearest noun, starting on the right and looking to the left only if no noun is found on the right. If no noun is found at all, the chunk is returned as is. But if a noun is found, then we look up the correct verb form depending on whether or not the noun is plural.

Recall from Chapter 4, Part-of-speech Tagging, that plural nouns are tagged with NNS, while singular nouns are tagged with NN. That means we can check the plurality of a noun by looking to see whether its tag ends with S. Once we get the corrected verb form, it is inserted into the chunk to replace the original verb form.

See also

The next four recipes all make use of first_chunk_index() to perform chunk transformations.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset