Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 8. Distributed Processing and Handling Large Datasets

In this chapter, we will cover the following recipes:

Distributed tagging with execnet
Distributed chunking with execnet
Parallel list processing with execnet
Storing a frequency distribution in Redis
Storing a conditional frequency distribution in Redis
Storing an ordered dictionary in Redis
Distributed word scoring with Redis and execnet

Introduction

NLTK is great for in-memory, single-processor natural language processing. However, there are times when you have a lot of data to process and want to take advantage of multiple CPUs, multicore CPUs, and even multiple computers. Or, you might want to store frequencies and probabilities in a persistent, shared database so multiple processes can access it simultaneously. For the first case, we'll be using execnet to do parallel and distributed processing with NLTK. For the second case, you'll learn how to use the Redis data structure server/database to store frequency distributions and more.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 8. Distributed Processing and Handling Large Datasets

Create new playlist

Sign In

Sign Up

Chapter 8. Distributed Processing and Handling Large Datasets

Introduction

Table of Contents for
8. Distributed Processing and Handling Large Datasets