Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 5. Extracting Chunks

In this chapter, we will cover the following recipes:

Chunking and chinking with regular expressions
Merging and splitting chunks with regular expressions
Expanding and removing chunks with regular expressions
Partial parsing with regular expressions
Training a tagger-based chunker
Classification-based chunking
Extracting named entities
Extracting proper noun chunks
Extracting location chunks
Training a named entity chunker
Training a chunker with NLTK-Trainer

Introduction

Chunk extraction, or partial parsing, is the process of extracting short phrases from a part-of-speech tagged sentence. This is different from full parsing in that we're interested in standalone chunks, or phrases, instead of full parse trees (for more on parse trees, see https://en.wikipedia.org/wiki/Parse_tree). The idea is that meaningful phrases can be extracted from a sentence by looking for particular patterns of part-of-speech tags.

As in Chapter 4, Part-of-speech Tagging, we'll be using the Penn Treebank corpus for basic training and testing chunk extraction. We'll also be using the CoNLL2000 corpus as it has a simpler and more flexible format that supports multiple chunk types (for more details on the conll2000 corpus and IOB tags, see the Creating a chunked phrase corpus recipe in Chapter 3, Creating Custom Corpora).

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 5. Extracting Chunks

Create new playlist

Sign In

Sign Up

Chapter 5. Extracting Chunks

Introduction

Table of Contents for
5. Extracting Chunks