Chapter 9. Discourse Analysis – Knowing Is Believing

Discourse analysis is another one of the applications of Natural Language Processing. Discourse analysis may be defined as the process of determining contextual information that is useful for performing other tasks, such as anaphora resolution (AR) (we will cover this section later in this chapter), NER, and so on.

This chapter will include the following topics:

  • Introducing discourse analysis
  • Discourse analysis using Centering Theory
  • Anaphora resolution

Introducing discourse analysis

The word discourse in linguistic terms means language in use. Discourse analysis may be defined as the process of performing text or language analysis, which involves text interpretation and knowing the social interactions. Discourse analysis may involve dealing with morphemes, n-grams, tenses, verbal aspects, page layouts, and so on. Discourse may be defined as the sequential set of sentences.

In most cases, we can interpret the meaning of the sentence on the basis of the preceding sentences.

Consider a discourse John went to the club on Saturday. He met Sam." Here, He refers to John.

Discourse Representation Theory (DRT) has been developed to provide a means for performing AR. A Discourse Representation Structure (DRS) has been developed that provides the meaning of discourse with the help of discourse referents and conditions. Discourse referents refer to variables used in first-order logic and things under consideration in a discourse. A discourse representation structure's conditions refer to the atomic formulas used in first-order predicate logic.

First Order Predicate Logic (FOPL) was developed to extend the idea of propositional logic. FOPL involves the use of functions, arguments, and quantifiers. Two types of quantifiers are used to represent the general sentences, namely, universal quantifiers and existential quantifiers. In FOPL, connectives, constants, and variables are also used. For instance, Robin is a bird can be represented in FOPL as bird (robin).

Let's see an example of the discourse representation structure:

Introducing discourse analysis

The preceding diagram is a representation of the following sentences:

  1. John went to a club
  2. John went to a club. He met Sam.

Here, the discourse consists of two sentences. Discourse Structure Representation may represent the entire text. For computationally processing DRS, it needs to be converted into a linear format.

The NLTK module that can be used to provide first order predicate logic implementation is nltk.sem.logic. Its UML diagram is shown here:

Introducing discourse analysis

The nltk.sem.logic module is used to define the expressions of first order predicate logic. Its UML diagram is comprised of various classes that are required for the representation of objects in first order predicate logic as well as their methods. The methods that are included are as follows:

  • substitute_bindings(bindings): Here, binding represents variable-to-expression mapping. It replaces variables present in the expression with a specific value.
  • Variables(): This comprises a set of all the variables that need to be replaced. It consists of constants as well as free variables.
  • replace(variable, expression, replace_bound): This is used for substituting the expression for a variable instance; replace_bound is used to specify whether we need to replace bound variables or not.
  • Normalize(): This is used to rename the autogenerated unique variables.
  • Visit(self,function,combinatory,default): This is used to visit subexpression calling functions; results are passed to the combinator that begins with a default value. Results of the combination are returned.
  • free(indvar_only): This is used to return the set of all the free variables of the object. Individual variables are returned if indvar_only is set to True.
  • Simplify(): This is used to simplify the expression that represents an object.

The NLTK module that provides a base for the discourse representation theory is nltk.sem.drt. It is built on top of nltk.sem.logic. Its UML class diagram comprises classes that are inherited from the nltk.sem.logic module. The following are the methods described in this module:

  • The get_refs(recursive): This method obtains the referents for the current discourse.
  • The fol(): This method is used for the conversion of DRS into first order predicate logic.
  • The draw(): This method is used for drawing DRS with the help of the Tkinter graphics library.

Let's see the UML class diagram of the nltk.sem.drt module:

Introducing discourse analysis

The NLTK module that provides access to WordNet 3.0 is nltk.corpus.reader.wordnet.

Linear format comprises discourse referents and DRS conditions, for example:

( [x], [John(x), Went(x)] )

Let's see the following code in NLTK, which can be used for the implementation of DRS:

>>> import nltk
>>> expr_read = nltk.sem.DrtExpression.from string
>>> expr1 = expr_read('([x], [John(x), Went(x)])')
>>> print(expr1)
([x],[John(x), Went(x)])
>>> expr1.draw()
>>> print(expr1.fol())
exists x.(John(x) & Went(x))

The preceding code of NLTK will draw the following image:

Introducing discourse analysis

Here, the expression is converted into FOPL using the fol() method.

Let's see the following code in NLTK for the other expression:

>>> import nltk
>>> expr_read = nltk.sem.DrtExpression.from string
>>> expr2 = expr_read('([x,y], [John(x), Went(x),Sam(y),Meet(x,y)])')
>>> print(expr2)
([x,y],[John(x), Went(x), Sam(y), Meet(x,y)])
>>> expr2.draw()
>>> print(expr2.fol())
exists x y.(John(x) & Went(x) & Sam(y) & Meet(x,y))

The fol() function is used to obtain the first order predicate logic equivalent of the expression. The preceding code displays the following image:

Introducing discourse analysis

We can perform the concatenation of two DRS using the DRS concatenation operator (+). Let's see the following code in NLTK that can be used to perform the concatenation of two DRS:

>>> import nltk
>>> expr_read = nltk.sem.DrtExpression.from string
>>> expr3 = expr_read('([x], [John(x), eats(x)])+ ([y],[Sam(y),eats(y)])')
>>> print(expr3)
(([x],[John(x), eats(x)]) + ([y],[Sam(y), eats(y)]))
>>> print(expr3.simplify())
([x,y],[John(x), eats(x), Sam(y), eats(y)]) 
>>> expr3.draw()

The preceding code draws the following image:

Introducing discourse analysis

Here, simplify() is used to simplify the expression.

Let's see the following code in NLTK, which can be used to embed one DRS into another:

>>> import nltk
>>> expr_read = nltk.sem.DrtExpression.from string
>>> expr4 = expr_read('([],[(([x],[student(x)])->([y],[book(y),read(x,y)]))])')
>>> print(expr4.fol())
all x.(student(x) -> exists y.(book(y) & read(x,y)))

Let's see another example that can be used to combine two sentences. Here, PRO has been used and resolve_anaphora() is used to perform AR:

>>> import nltk
>>> expr_read = nltk.sem.DrtExpression.from string
>>> expr5 = expr_read('([x,y],[ram(x),food(y),eats(x,y)])')
>>> expr6 = expr_read('([u,z],[PRO(u),coffee(z),drinks(u,z)])')
>>> expr7=expr5+expr6
>>> print(expr7.simplify())
([u,x,y,z],[ram(x), food(y), eats(x,y), PRO(u), coffee(z), drinks(u,z)])
>>> print(expr7.simplify().resolve_anaphora())
([u,x,y,z],[ram(x), food(y), eats(x,y), (u = [x,y,z]), coffee(z), drinks(u,z)])

Discourse analysis using Centering Theory

Discourse analysis using Centering Theory is the first step toward corpus annotation. It also involves the task of AR. In Centering Theory, we perform the task of segmenting discourse into various units for analysis.

Centering Theory involves the following:

  • Interaction between purposes or intentions of discourse participants and discourse
  • Attention of participants
  • Discourse structure

Centering is related to participants attention and how the local as well as global structures affect expressions and the coherence of discourse.

Anaphora resolution

AR may be defined as the process by which a pronoun or a noun phrase used in the sentence is resolved and refers to a specific entity on the basis of discourse knowledge.

For example:

John helped Sara. He was kind.

Here, He refers to John.

AR is of three types, namely:

  • Pronominal: Here, the referent is referred to by a pronoun. For example, Sam found the love of his life. Here, 'his' refers to 'Sam'.
  • Definite noun phrase: Here, the antecedent may be referred to by the phrase of the form, <the><noun phrase>. For example, The relationship could not last long. Here, The relationship refers to the love in the previous sentence.
  • Quantifier/ordinal: The quantifier, such as one, and the ordinal, such as first, are also examples of AR. For example, He began a new one. Here, one refers to the relationship.

In cataphora, the referent precedes the antecedent. For example, After his class, Sam will go home. Here, his refers to Sam.

For integrating some extensions in a NLTK architecture, a new module is developed on top of the existing modules, nltk.sem.logic and nltk.sem.drt. The new module acts like a replacement for the nltk.sem.drt module. There is a replacement of all the classes with the enhanced classes.

A method called resolve() can be called indirectly and directly from a class called AbstractDRS(). It then provides a list consisting of resolved copies of a particular object. An object that needs to be resolved must override the readings() method. The resolve() method is used to generate readings using the traverse() function. The traverse() function is used to perform sorting on the list of operations. A priority order list includes the following:

  • Binding operations
  • Local accommodation operations
  • Intermediate accommodation operations
  • Global accommodation operations

Let's see the flow diagram of the traverse() function:

Anaphora resolution

After the priority order of operations is generated, the following takes place:

  • Readings are generated from the operation with the help of the deepcopy() method. The current operation is taken as an argument.
  • When the readings() method runs, a list of operations are performed.
  • Till the list of operations is not empty, run is performed on those operations.
  • If there are no operations left to be performed, admissibility check will be run on the final reading; if the check is successful, it will be stored.

In AbstractDRS(), the resolve() method is defined. It is defined as follows:

def resolve(self, verbose=False)

The PresuppositionDRS class includes the following methods:

  • find_bindings(drs_list, collect_event_data): Bindings are found from the list of DRS instances using the is_possible_binding method. Collection of participation information is done if collect_event_data is set to True.
  • is_possible_binding(cond): This finds out whether the condition is a binding candidate and makes sure that it is an unary predicate with the features that match the trigger conditions.
  • is_presupposition.cond(cond): This is used to identify a trigger condition among all the conditions.
  • presupposition_readings(trail): This is like readings in the subclasses of PresuppositionDRS.

Let's see the classes that are inherited from AbstractDRS:

Anaphora resolution

Let's see the classes that are inherited in DRTAbstractVariableExpression:

Anaphora resolution

Let's see the classes inherited from DrtBooleanExpression:

Anaphora resolution

Let's see the classes inherited from DrtApplicationExpression:

Anaphora resolution

Let's see the classes inherited from DRS:

Anaphora resolution
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset