Knowing the relationship between elements of a sentence is important in many analysis tasks. It is useful for assessing the important content of a sentence and providing insight into the meaning of a sentence. This type of analysis has been used for tasks ranging from grammar checking to speech recognition to language translations.
In the previous section, we demonstrated one approach used to extract the parts of speech. Using this technique, we were able to identify the sentence element types present in a sentence. However, the relationships between these elements is missing. We need to parse the sentence to extract these relationships between sentence elements.
There are several techniques and APIs that can be used to extract this type of information. In this section we will use OpenNLP to demonstrate one way of extracting the structure of a sentence. The demonstration is centered around the ParserTool
class, which uses a previously trained model. The parsing process will return the probabilities that the sentence's elements extracted are correct. As will many NLP tasks, there are often multiple answers possible.
We start with a try-with-resource block to open an input stream for the model. The en-parser-chunking.bin
file contains a model that uses parses text into its POS. In this case, it is trained for English:
try (InputStream modelInputStream = new FileInputStream( new File("en-parser-chunking.bin"));) { ... } catch (Exception ex) { // Handle exceptions }
Within the try block an instance of the ParserModel
class is created using the input stream. The actual parser is created next using the ParserFactory
class's create
method:
ParserModel parserModel = new ParserModel(modelInputStream); Parser parser = ParserFactory.create(parserModel);
We will use the following sentence to test the parser. The ParserTool
class's parseLine
method does the actual parsing and returns an array of Parse
objects. Each of these objects holds one parsing alternative. The last argument of the parseLine
method specifies how many alternatives to return:
String sentence = "Let's parse this sentence."; Parse[] parseTrees = ParserTool.parseLine(sentence, parser, 3);
The next sequence displays each of the possibilities:
for(Parse tree : parseTrees) { tree.show(); }
The output of the show method for this example follows. The tags were previously defined in Understanding POS tags section:
(TOP (NP (NP (NNP Let's) (NN parse)) (NP (DT this) (NN sentence.)))) (TOP (S (NP (NNP Let's)) (VP (VB parse) (NP (DT this) (NN sentence.))))) (TOP (S (NP (NNP Let's)) (VP (VBD parse) (NP (DT this) (NN sentence.)))))
The following example reformats the last two outputs to better show the relationships. They differ in how they classify the verb parse:
(TOP (S (NP (NNP Let's)) (VP (VB parse) (NP (DT this) (NN sentence.)) ) ) ) (TOP (S (NP (NNP Let's)) (VP (VBD parse) (NP (DT this) (NN sentence.)) ) ) )
When there are multiple parse alternatives, the Parse
class's getProb
returns a probability that reflects the model's confidence in the alternatives. The following sequence demonstrates this method:
for(Parse tree : parseTrees) { out.println("Probability: " + tree.getProb()); }
The output follows:
Probability: -3.6810244423259078 Probability: -3.742475884515823 Probability: -4.16148634555491
Another interesting NLP task is sentiment analysis, which we will demonstrate next.