Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 3. Creating Custom Corpora

In this chapter, we will cover the following recipes:

Setting up a custom corpus
Creating a wordlist corpus
Creating a part-of-speech tagged word corpus
Creating a chunked phrase corpus
Creating a categorized text corpus
Creating a categorized chunk corpus reader
Lazy corpus loading
Creating a custom corpus view
Creating a MongoDB-backed corpus reader
Corpus editing with file locking

Introduction

In this chapter, we'll cover how to use corpus readers and create custom corpora. If you want to train your own model, such as a part-of-speech tagger or text classifier, you will need to create a custom corpus to train on. Model training is covered in the subsequent chapters.

Now you'll learn how to use the existing corpus data that comes with NLTK. This information is essential for future chapters when we'll need to access the corpora as training data. You've already accessed the WordNet corpus in Chapter 1, Tokenizing Text and WordNet Basics. This chapter will introduce you to many more corpora.

We'll also cover creating custom corpus readers, which can be used when your corpus is not in a file format that NLTK already recognizes, or if your corpus is not located in files at all, but instead is located in a database such as MongoDB. It is essential to be familiar with tokenization, which was covered in Chapter 1, Tokenizing Text and WordNet Basics.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 3. Creating Custom Corpora

Create new playlist

Sign In

Sign Up

Chapter 3. Creating Custom Corpora

Introduction

Table of Contents for
3. Creating Custom Corpora