Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Finding sentences

Words (tokens) aren't the only structures that we're interested in, however. Another interesting and useful grammatical structure is the sentence. In this recipe, we'll use a process similar to the one we used in the previous recipe, Tokenizing text, in order to create a function that will pull sentences from a string in the same way that tokenize pulled tokens from a string in the last recipe.

Getting ready

We'll need to include clojure-opennlp in our project.clj file:

(defproject com.ericrochester/text-data "0.1.0-SNAPSHOT"
  :dependencies [[org.clojure/clojure "1.6.0"]
                 [clojure-opennlp "0.3.2"]])

We will also need to require it into the current namespace:

(require '[opennlp.nlp :as nlp])

Finally, we'll download a model for a statistical sentence splitter. I downloaded en-sent.bin from http://opennlp.sourceforge.net/models-1.5/. I then saved it into models/en-sent.bin.

How to do it…

As in the Tokenizing text recipe, we will start by loading the sentence identification model data, as shown here:

(def get-sentences
  (nlp/make-sentence-detector "models/en-sent.bin"))

Now, we use that data to split a text into a series of sentences, as follows:

user=> (get-sentences "I never saw a Purple Cow.
           I never hope to see one.
           But I can tell you, anyhow.
           I'd rather see than be one.")
 ["I never saw a Purple Cow."
  "I never hope to see one."
  "But I can tell you, anyhow."
  "I'd rather see than be one."]

How it works…

The data model in models/en-sent.bin contains the information that OpenNLP needs to recreate a previously-trained sentence identification algorithm. Once we have reinstantiated this algorithm, we can use it to identify the sentences in a text, as we did by calling get-sentences.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Finding sentences

Create new playlist

Sign In

Sign Up

Finding sentences

Getting ready

How to do it…

How it works…

Table of Contents for
Finding sentences