Training a Brill tagger

The BrillTagger class is a transformation-based tagger. It is the first tagger that is not a subclass of SequentialBackoffTagger. Instead, the BrillTagger class uses a series of rules to correct the results of an initial tagger. These rules are scored based on how many errors they correct minus the number of new errors they produce.

How to do it...

Here's a function from tag_util.py that trains a BrillTagger class using BrillTaggerTrainer. It requires an initial_tagger and train_sents.

from nltk.tag import brill, brill_trainer

def train_brill_tagger(initial_tagger, train_sents, **kwargs):
  templates = [
    brill.Template(brill.Pos([-1])),
    brill.Template(brill.Pos([1])),
    brill.Template(brill.Pos([-2])),
    brill.Template(brill.Pos([2])),
    brill.Template(brill.Pos([-2, -1])),
    brill.Template(brill.Pos([1, 2])),
    brill.Template(brill.Pos([-3, -2, -1])),
    brill.Template(brill.Pos([1, 2, 3])),
    brill.Template(brill.Pos([-1]), brill.Pos([1])),
    brill.Template(brill.Word([-1])),
    brill.Template(brill.Word([1])),
    brill.Template(brill.Word([-2])),
    brill.Template(brill.Word([2])),
    brill.Template(brill.Word([-2, -1])),
    brill.Template(brill.Word([1, 2])),
    brill.Template(brill.Word([-3, -2, -1])),
    brill.Template(brill.Word([1, 2, 3])),
    brill.Template(brill.Word([-1]), brill.Word([1])),
  ]

  trainer = brill_trainer.BrillTaggerTrainer(initial_tagger, templates, deterministic=True)
  return trainer.train(train_sents, **kwargs)

To use it, we can create our initial_tagger from a backoff chain of NgramTagger classes, then pass that into the train_brill_tagger() function to get a BrillTagger back.

>>> default_tagger = DefaultTagger('NN')
>>> initial_tagger = backoff_tagger(train_sents, [UnigramTagger, BigramTagger, TrigramTagger], backoff=default_tagger)
>>> initial_tagger.evaluate(test_sents)
0.8806820634578028
>>> from tag_util import train_brill_tagger
>>> brill_tagger = train_brill_tagger(initial_tagger, train_sents)
>>> brill_tagger.evaluate(test_sents)
0.8827541549751781

So, the BrillTagger class has slightly increased accuracy over the initial_tagger.

How it works...

The BrillTaggerTrainer class takes an initial_tagger argument and a list of templates. These templates must implement the BrillTemplateI interface, which is found in the nltk.tbl.template module. The brill.Template class is such an implementation, and is actually imported from nltk.tbl.template. The brill.Pos and brill.Word classes are subclasses of nltk.tbl.template.Feature, and they describe what kind of features to use in the template, in this case, one or more part-of-speech tags or words.

The templates specify how to learn transformation rules. For example, brill.Template(brill.Pos([-1])) means that a rule can be generated using the previous part-of-speech tag. The brill.Template(brill.Pos([1])) statement means that you can look at the next part-of-speech tag to generate a rule. And brill.Template(brill.Word([-2, -1])) means you can look at the combination of the previous two words to learn a transformation rule.

The thinking behind a transformation-based tagger is this: given the correct training sentences, the output of the initial tagger, and the templates specifying features, try to generate transformation rules that correct the initial tagger's output to be more in-line with the training sentences. The job of BrillTaggerTrainer is to produce these rules, and to do so in a way that increases accuracy. A transformation rule that fixes one problem may cause an error in another condition; thus, every rule must be measured by how many errors it corrects versus how many new errors it introduces.

The workflow looks something like this:

How it works...

There's more...

You can control the number of rules generated using the max_rules keyword argument to the BrillTaggerTrainer.train() method. The default value is 200. You can also control the quality of rules used with the min_score keyword argument. The default value is 2, though 3 can be a good choice as well. The score is a measure of how well a rule corrects errors compared to how many new errors it introduces.

Note

Increasing max_rules or min_score will greatly increase training time, without necessarily increasing accuracy. Change these values with care.

Tracing

You can watch the BrillTaggerTrainer class do its work by passing trace=True into the constructor, for example, trainer = brill.BrillTaggerTrainer(initial_tagger, templates, deterministic=True, trace=True). This will give you the following output:

TBL train (fast) (seqs: 3000; tokens: 77511; tpls: 18; min score: 2; min acc: None)
    Finding initial useful rules...
        Found 9869 useful rules.
    Selecting rules...

This means it found 77511 rules with a score of at least min_score, and then it selects the best rules, keeping no more than max_rules.

The default is trace=False, which means the trainer will work silently without printing its status.

See also

The Training and combining ngram taggers recipe details the construction of the initial_tagger argument used earlier, and the Default tagging recipe explains the default_tagger argument.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset