Semantic analysis, or meaning generation is one of the tasks in NLP. It is defined as the process of determining the meaning of character sequences or word sequences. It may be used for performing the task of disambiguation.
This chapter will include the following topics:
NLP means performing computations on natural language. One of the steps performed while processing a natural language is semantic analysis. While analyzing an input sentence, if the syntactic structure of a sentence is built, then the semantic analysis of a sentence will be done. Semantic interpretation means mapping a meaning to a sentence. Contextual interpretation is mapping the logical form to the knowledge representation. The primitive or the basic unit of semantic analysis is referred to as meaning or sense. One of the tools dealing with senses is ELIZA. ELIZA was developed in the sixties by Joseph Weizenbaum. It made use of substitution and pattern matching techniques to analyze the sentence and provide an output to the given input. MARGIE was developed by Robert Schank in the seventies. It could represent all the English verbs using 11 primitives. MARGIE could interpret the sense of a sentence and represent it with the help of primitives. It further gave way to the concept of scripts. From MARGIE, Script Applier Mechanism (SAM) was developed. It could translate a sentence from different languages, such as English, Chinese, Russian, Dutch, and Spanish. In order to perform processing on textual data, a Python library or TextBlob is used. TextBlob provides APIs for performing NLP tasks, such as Part-of-Speech tagging, extraction of Noun Phrases, classification, machine translation, sentiment analysis.
Semantic analysis can be used to query a database and retrieve information. Another Python library, Gensim, can be used to perform document indexing, topic modeling, and similarity retrieval. Polyglot is an NLP tool that supports various multilingual applications. It provides NER for 40 different languages, tokenization for 165 different languages, language detection for 196 different languages, sentiment analysis for 136 different languages, POS tagging for 16 different languages, Morphological Analysis for 135 different languages, word embedding for 137 different languages, and transliteration for 69 different languages. MontyLingua is an NLP tool that is used to perform the semantic interpretation of English text. From English sentences, it extracts semantic information, such as verbs, nouns, adjectives, dates, phrases, and so on.
Sentences can be formally represented using logics. The basic expressions or sentences in propositional logic are represented using propositional symbols, such as P,Q, R, and so on. Complex expressions in propositional logic can be represented using Boolean operators. For example, to represent the sentence If it is raining, I'll wear a raincoat using propositional logic:
Consider the following code to represent operators used in NLTK:
>>> import nltk >>> nltk.boolean_ops() negation - conjunction & disjunction | implication -> equivalence <->
Well-formed Formulas (WFF) are formed using propositional symbols or using a combination of propositional symbols and Boolean operators.
Let's see the following code in NLTK, that categorizes logical expressions into different subclasses:
>>> import nltk >>> input_expr = nltk.sem.Expression.from string >>> input_expr('X | (Y -> Z)') <OrExpression (X | (Y -> Z))> >>> input_expr('-(X & Y)') <NegatedExpression -(X & Y)> >>> input_expr('X & Y') <AndExpression (X & Y)> >>> input_expr('X <-> -- X') <IffExpression (X <-> --X)>
For mapping True
or False
values to logical expressions, the Valuation
function is used in NLTK:
>>> import nltk >>> value = nltk.Valuation([('X', True), ('Y', False), ('Z', True)]) >>> value['Z'] True >>> domain = set() >>> v = nltk.Assignment(domain) >>> u = nltk.Model(domain, value) >>> print(u.evaluate('(X & Y)', v)) False >>> print(u.evaluate('-(X & Y)', v)) True >>> print(u.evaluate('(X & Z)', v)) True >>> print(u.evaluate('(X | Y)', v)) True
First order predicate logic involving constants and predicates in NLTK are depicted in the following code:
>>> import nltk >>> input_expr = nltk.sem.Expression.from string >>> expression = input_expr('run(marcus)', type_check=True) >>> expression.argument <ConstantExpressionmarcus> >>> expression.argument.type e >>> expression.function <ConstantExpression run> >>> expression.function.type <e,?> >>> sign = {'run': '<e, t>'} >>> expression = input_expr('run(marcus)', signature=sign) >>> expression.function.type e
The signature
is used in NLTK to map associated types and non-logical constants. Consider the following code in NLTK that helps to generate a query and retrieve data from the database:
>>> import nltk >>> nltk.data.show_cfg('grammars/book_grammars/sql1.fcfg') % start S S[SEM=(?np + WHERE + ?vp)] -> NP[SEM=?np] VP[SEM=?vp] VP[SEM=(?v + ?pp)] -> IV[SEM=?v] PP[SEM=?pp] VP[SEM=(?v + ?ap)] -> IV[SEM=?v] AP[SEM=?ap] VP[SEM=(?v + ?np)] -> TV[SEM=?v] NP[SEM=?np] VP[SEM=(?vp1 + ?c + ?vp2)] -> VP[SEM=?vp1] Conj[SEM=?c] VP[SEM=?vp2] NP[SEM=(?det + ?n)] ->Det[SEM=?det] N[SEM=?n] NP[SEM=(?n + ?pp)] -> N[SEM=?n] PP[SEM=?pp] NP[SEM=?n] -> N[SEM=?n] | CardN[SEM=?n] CardN[SEM='1000'] -> '1,000,000' PP[SEM=(?p + ?np)] -> P[SEM=?p] NP[SEM=?np] AP[SEM=?pp] -> A[SEM=?a] PP[SEM=?pp] NP[SEM='Country="greece"'] -> 'Greece' NP[SEM='Country="china"'] -> 'China' Det[SEM='SELECT'] -> 'Which' | 'What' Conj[SEM='AND'] -> 'and' N[SEM='City FROM city_table'] -> 'cities' N[SEM='Population'] -> 'populations' IV[SEM=''] -> 'are' TV[SEM=''] -> 'have' A -> 'located' P[SEM=''] -> 'in' P[SEM='>'] -> 'above' >>> from nltk import load_parser >>> test = load_parser('grammars/book_grammars/sql1.fcfg') >>> q=" What cities are in Greece" >>> t = list(test.parse(q.split())) >>> ans = t[0].label()['SEM'] >>> ans = [s for s in ans if s] >>> q = ' '.join(ans) >>> print(q) SELECT City FROM city_table WHERE Country="greece" >>> from nltk.sem import chat80 >>> r = chat80.sql_query('corpora/city_database/city.db', q) >>> for p in r: print(p[0], end=" ") athens
Named entity recognition (NER) is the process in which proper nouns or named entities are located in a document. Then, these Named Entities are classified into different categories, such as Name of Person, Location, Organization, and so on.
There are 12 NER tagsets defined by IIIT-Hyderabad IJCNLP 2008. These are described here:
SNO. |
Named entity tag |
Meaning |
---|---|---|
1 |
NEP |
Name of Person |
2 |
NED |
Name of Designation |
3 |
NEO |
Name of Organization |
4 |
NEA |
Name of Abbreviation |
5 |
NEB |
Name of Brand |
6 |
NETP |
Title of Person |
7 |
NETO |
Title of Object |
8 |
NEL |
Name of Location |
9 |
NETI |
Time |
10 |
NEN |
Number |
11 |
NEM |
Measure |
12 |
NETE |
Terms |
One of the applications of NER is information extraction. In NLTK, we can perform the task of information extraction by storing the tuple (entity, relation, entity), and then, the entity value can be retrieved.
Consider an example in NLTK that shows how information extraction is performed:
>>> import nltk >>> locations=[('Jaipur', 'IN', 'Rajasthan'),('Ajmer', 'IN', 'Rajasthan'),('Udaipur', 'IN', 'Rajasthan'),('Mumbai', 'IN', 'Maharashtra'),('Ahmedabad', 'IN', 'Gujrat')] >>> q = [x1 for (x1, relation, x2) in locations if x2=='Rajasthan'] >>> print(q) ['Jaipur', 'Ajmer', 'Udaipur']
The nltk.tag.stanford
module is used that makes use of stanford taggers to perform NER. We can download tagger models from http://nlp.stanford.edu/software.
Let's see the following example in NLTK that can be used to perform NER using the Stanford
tagger:
>>> from nltk.tag import StanfordNERTagger >>> sentence = StanfordNERTagger('english.all.3class.distsim.crf.ser.gz') >>> sentence.tag('John goes to NY'.split()) [('John', 'PERSON'), ('goes', 'O'), ('to', 'O'),('NY', 'LOCATION')]
A classifier has been trained in NLTK to detect Named Entities. Using the function nltk.ne.chunk()
, named entities can be identified from a text. If the parameter binary is set to true
, then the named entities are detected and tagged with the NE
tag; otherwise the named entities are tagged with tags such as PERSON, GPE, and ORGANIZATION.
Let's see the following code, that detects Named Entities, if they exist, and tags them with the NE
tag:
>>> import nltk >>> sentences1 = nltk.corpus.treebank.tagged_sents()[17] >>> print(nltk.ne_chunk(sentences1, binary=True)) (S The/DT total/NN of/IN 18/CD deaths/NNS from/IN malignant/JJ mesothelioma/NN ,/, lung/NN cancer/NN and/CC asbestosis/NN was/VBD far/RB higher/JJR than/IN */-NONE- expected/VBN *?*/-NONE- ,/, the/DT researchers/NNS said/VBD 0/-NONE- *T*-1/-NONE- ./.) >>> sentences2 = nltk.corpus.treebank.tagged_sents()[7] >>> print(nltk.ne_chunk(sentences2, binary=True)) (S A/DT (NE Lorillard/NNP) spokewoman/NN said/VBD ,/, ``/`` This/DT is/VBZ an/DT old/JJ story/NN ./.) >>> print(nltk.ne_chunk(sentences2)) (S A/DT (ORGANIZATION Lorillard/NNP) spokewoman/NN said/VBD ,/, ``/`` This/DT is/VBZ an/DT old/JJ story/NN ./.)
Consider another example in NLTK that can be used to detect named entities:
>>> import nltk >>> from nltk.corpus import conll2002 >>> for documents in conll2002.chunked_sents('ned.train')[25]: print(documents) (PER Vandenbussche/Adj) ('zelf', 'Pron') ('besloot', 'V') ('dat', 'Conj') ('het', 'Art') ('hof', 'N') ('"', 'Punc') ('de', 'Art') ('politieke', 'Adj') ('zeden', 'N') ('uit', 'Prep') ('het', 'Art') ('verleden', 'N') ('"', 'Punc') ('heeft', 'V') ('willen', 'V') ('veroordelen', 'V') ('.', 'Punc')
A chunker
is a program that is used to partition plain text into a sequence of semantically related words. To perform NER in NLTK, default chunkers are used. Default chunkers are chunkers based on classifiers that have been trained on the ACE corpus. Other chunkers have been trained on parsed or chunked NLTK corpora. The languages covered by these NLTK chunkers are as follows:
Consider another example in NLTK that identifies named entities and categorizes into different named entity classes:
>>> import nltk >>> sentence = "I went to Greece to meet John"; >>> tok=nltk.word_tokenize(sentence) >>> pos_tag=nltk.pos_tag(tok) >>> print(nltk.ne_chunk(pos_tag)) (S I/PRP went/VBD to/TO (GPE Greece/NNP) to/TO meet/VB (PERSON John/NNP))
HMM is one of the popular statistical approaches of NER. An HMM is defined as a Stochastic Finite State Automaton (SFSA) consisting of a finite set of states that are associated with the definite probability distribution. States are unobserved or hidden. HMM generates optimal state sequences as an output. HMM is based on the Markov Chain property. According to the Markov Chain property, the probability of the occurrence of the next state is dependent on the previous tag. It is the simplest approach to implement. The drawback of HMM is that it requires a large amount of training and it cannot be used for large dependencies. HMM consists of the following:
HMM is represented by the following tuple—λ= (A, B, π).
Start probability or initial state probability may be defined as the probability that a particular tag occurs first in a sentence.
Transition probability (A=aij) may be defined as the probability of the occurrence of the next tag j in a sentence given the occurrence of the particular tag i at present.
A=aij= the number of transitions from state si to sj /the number of transitions from state si
Emission probability (B=bj(O)) may be defined as the probability of the occurrence of an output sequence given a state j.
B=bj(k)= the number of times in state j and observing the symbol k /the expected number of times in state j.
The Baum Welch algorithm is used to find the maximum likelihood and the posterior mode estimates for HMM parameters. The forward-backward algorithm is used to find the posterior marginals of all the hidden state variables given a sequence of emissions or observations.
There are three steps involved in performing NER using HMM—Annotation, HMM train, and HMM test. The Annotation module converts raw text into annotated or trainable data. During HMM train, we compute HMM parameters—start probability, transition probability, and emission probability. During HMM test, the Viterbi algorithm is used. that finds out the optimal tag sequence.
Consider an example of chunking using the HMM in NLTK. Using chunking, the NP and VP chunks can be obtained. NP chunks can further be processed to obtain proper nouns or named entities:
>>> import nltk >>> nltk.tag.hmm.demo_pos() HMM POS tagging demo Training HMM... Testing... Test: the/AT fulton/NP county/NN grand/JJ jury/NN said/VBD friday/NR an/AT investigation/NN of/IN atlanta's/NP$ recent/JJ primary/NN election/NN produced/VBD ``/`` no/AT evidence/NN ''/'' that/CS any/DTI irregularities/NNS took/VBD place/NN ./. Untagged: the fulton county grand jury said friday an investigation of atlanta's recent primary election produced `` no evidence '' that any irregularities took place . HMM-tagged: the/AT fulton/NP county/NN grand/JJ jury/NN said/VBD friday/NR an/AT investigation/NN of/IN atlanta's/NP$ recent/JJ primary/NN election/NN produced/VBD ``/`` no/AT evidence/NN ''/'' that/CS any/DTI irregularities/NNS took/VBD place/NN ./. Entropy: 18.7331739705 ------------------------------------------------------------ Test: the/AT jury/NN further/RBR said/VBD in/IN term-end/NN presentments/NNS that/CS the/AT city/NN executive/JJ committee/NN ,/, which/WDT had/HVD over-all/JJ charge/NN of/IN the/AT election/NN ,/, ``/`` deserves/VBZ the/AT praise/NN and/CC thanks/NNS of/IN the/AT city/NN of/IN atlanta/NP ''/'' for/IN the/AT manner/NN in/IN which/WDT the/AT election/NN was/BEDZ conducted/VBN ./. Untagged: the jury further said in term-end presentments that the city executive committee , which had over-all charge of the election , `` deserves the praise and thanks of the city of atlanta '' for the manner in which the election was conducted . HMM-tagged: the/AT jury/NN further/RBR said/VBD in/IN term-end/AT presentments/NN that/CS the/AT city/NN executive/NN committee/NN ,/, which/WDT had/HVD over-all/VBN charge/NN of/IN the/AT election/NN ,/, ``/`` deserves/VBZ the/AT praise/NN and/CC thanks/NNS of/IN the/AT city/NN of/IN atlanta/NP ''/'' for/IN the/AT manner/NN in/IN which/WDT the/AT election/NN was/BEDZ conducted/VBN ./. Entropy: 27.0708725519 ------------------------------------------------------------ Test: the/AT september-october/NP term/NN jury/NN had/HVD been/BEN charged/VBN by/IN fulton/NP superior/JJ court/NN judge/NN durwood/NP pye/NP to/TO investigate/VB reports/NNS of/IN possible/JJ ``/`` irregularities/NNS ''/'' in/IN the/AT hard-fought/JJ primary/NN which/WDT was/BEDZ won/VBN by/IN mayor-nominate/NN ivan/NP allen/NP jr./NP ./. Untagged: the september-october term jury had been charged by fulton superior court judge durwoodpye to investigate reports of possible `` irregularities '' in the hard-fought primary which was won by mayor-nominate ivanallenjr. . HMM-tagged: the/AT september-october/JJ term/NN jury/NN had/HVD been/BEN charged/VBN by/IN fulton/NP superior/JJ court/NN judge/NN durwood/TO pye/VB to/TO investigate/VB reports/NNS of/IN possible/JJ ``/`` irregularities/NNS ''/'' in/IN the/AT hard-fought/JJ primary/NN which/WDT was/BEDZ won/VBN by/IN mayor-nominate/NP ivan/NP allen/NP jr./NP ./. Entropy: 33.8281874237 ------------------------------------------------------------ Test: ``/`` only/RB a/AT relative/JJ handful/NN of/IN such/JJ reports/NNS was/BEDZ received/VBN ''/'' ,/, the/AT jury/NN said/VBD ,/, ``/`` considering/IN the/AT widespread/JJ interest/NN in/IN the/AT election/NN ,/, the/AT number/NN of/IN voters/NNS and/CC the/AT size/NN of/IN this/DT city/NN ''/'' ./. Untagged: `` only a relative handful of such reports was received '' , the jury said , `` considering the widespread interest in the election , the number of voters and the size of this city '' . HMM-tagged: ``/`` only/RB a/AT relative/JJ handful/NN of/IN such/JJ reports/NNS was/BEDZ received/VBN ''/'' ,/, the/AT jury/NN said/VBD ,/, ``/`` considering/IN the/AT widespread/JJ interest/NN in/IN the/AT election/NN ,/, the/AT number/NN of/IN voters/NNS and/CC the/AT size/NN of/IN this/DT city/NN ''/'' ./. Entropy: 11.4378198596 ------------------------------------------------------------ Test: the/AT jury/NN said/VBD it/PPS did/DOD find/VB that/CS many/AP of/IN georgia's/NP$ registration/NN and/CC election/NN laws/NNS ``/`` are/BER outmoded/JJ or/CC inadequate/JJ and/CC often/RB ambiguous/JJ ''/'' ./. Untagged: the jury said it did find that many of georgia's registration and election laws `` are outmoded or inadequate and often ambiguous '' . HMM-tagged: the/AT jury/NN said/VBD it/PPS did/DOD find/VB that/CS many/AP of/IN georgia's/NP$ registration/NN and/CC election/NN laws/NNS ``/`` are/BER outmoded/VBG or/CC inadequate/JJ and/CC often/RB ambiguous/VB ''/'' ./. Entropy: 20.8163623192 ------------------------------------------------------------ Test: it/PPS recommended/VBD that/CS fulton/NP legislators/NNS act/VB ``/`` to/TO have/HV these/DTS laws/NNS studied/VBN and/CC revised/VBN to/IN the/AT end/NN of/IN modernizing/VBG and/CC improving/VBG them/PPO ''/'' ./. Untagged: it recommended that fulton legislators act `` to have these laws studied and revised to the end of modernizing and improving them '' . HMM-tagged: it/PPS recommended/VBD that/CS fulton/NP legislators/NNS act/VB ``/`` to/TO have/HV these/DTS laws/NNS studied/VBD and/CC revised/VBD to/IN the/AT end/NN of/IN modernizing/NP and/CC improving/VBG them/PPO ''/'' ./. Entropy: 20.3244921203 ------------------------------------------------------------ Test: the/AT grand/JJ jury/NN commented/VBD on/IN a/AT number/NN of/IN other/AP topics/NNS ,/, among/IN them/PPO the/AT atlanta/NP and/CC fulton/NP county/NN purchasing/VBG departments/NNS which/WDT it/PPS said/VBD ``/`` are/BER well/QL operated/VBN and/CC follow/VB generally/RB accepted/VBN practices/NNS which/WDT inure/VB to/IN the/AT best/JJT interest/NN of/IN both/ABX governments/NNS ''/'' ./. Untagged: the grand jury commented on a number of other topics , among them the atlanta and fulton county purchasing departments which it said `` are well operated and follow generally accepted practices which inure to the best interest of both governments '' . HMM-tagged: the/AT grand/JJ jury/NN commented/VBD on/IN a/AT number/NN of/IN other/AP topics/NNS ,/, among/IN them/PPO the/AT atlanta/NP and/CC fulton/NP county/NN purchasing/NN departments/NNS which/WDT it/PPS said/VBD ``/`` are/BER well/RB operated/VBN and/CC follow/VB generally/RB accepted/VBN practices/NNS which/WDT inure/VBZ to/IN the/AT best/JJT interest/NN of/IN both/ABX governments/NNS ''/'' ./. Entropy: 31.3834231469 ------------------------------------------------------------ Test: merger/NN proposed/VBN Untagged: merger proposed HMM-tagged: merger/PPS proposed/VBD Entropy: 5.6718203946 ------------------------------------------------------------ Test: however/WRB ,/, the/AT jury/NN said/VBD it/PPS believes/VBZ ``/`` these/DTS two/CD offices/NNS should/MD be/BE combined/VBN to/TO achieve/VB greater/JJR efficiency/NN and/CC reduce/VB the/AT cost/NN of/IN administration/NN ''/'' ./. Untagged: however , the jury said it believes `` these two offices should be combined to achieve greater efficiency and reduce the cost of administration '' . HMM-tagged: however/WRB ,/, the/AT jury/NN said/VBD it/PPS believes/VBZ ``/`` these/DTS two/CD offices/NNS should/MD be/BE combined/VBN to/TO achieve/VB greater/JJR efficiency/NN and/CC reduce/VB the/AT cost/NN of/IN administration/NN ''/'' ./. Entropy: 8.27545943909 ------------------------------------------------------------ Test: the/AT city/NN purchasing/VBG department/NN ,/, the/AT jury/NN said/VBD ,/, ``/`` is/BEZ lacking/VBG in/IN experienced/VBN clerical/JJ personnel/NNS as/CS a/AT result/NN of/IN city/NN personnel/NNS policies/NNS ''/'' ./. Untagged: the city purchasing department , the jury said , `` is lacking in experienced clerical personnel as a result of city personnel policies '' . HMM-tagged: the/AT city/NN purchasing/NN department/NN ,/, the/AT jury/NN said/VBD ,/, ``/`` is/BEZ lacking/VBG in/IN experienced/AT clerical/JJ personnel/NNS as/CS a/AT result/NN of/IN city/NN personnel/NNS policies/NNS ''/'' ./. Entropy: 16.7622537278 ------------------------------------------------------------ accuracy over 284 tokens: 92.96
The outcome of an NER tagger may be defined as a response and an interpretation of human beings as answer key. So, we provide the following definitions:
Performance of an NER-based system can be judged by using the following parameters:
P=Correct/ (Correct+Incorrect+Missing)
R=Correct/ (Correct+Incorrect+Spurious)
F-Measure = (2*PREC*REC)/(PRE+REC)
NER can be performed using the following approaches:
It has been proved experimentally that Machine learning-based approaches outperform Rule-based approaches. Also, if a combination of Rule-based approaches and Machine Learning-based approaches is used, then the performance of NER will increase.
Using POS tagging, NER can be performed. The POS tags that can be used are as follows (they are available at https://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html):
Tag |
Description |
---|---|
CC |
Coordinating conjunction |
CD |
Cardinal number |
DT |
Determiner |
EX |
Existential there |
FW |
Foreign word |
IN |
Preposition or subordinating conjunction |
JJ |
Adjective |
JJR |
Adjective, comparative |
JJS |
Adjective, superlative |
LS |
List item marker |
MD |
Modal |
NN |
Noun, singular or mass |
NNS |
Noun, plural |
NNP |
Proper noun, singular |
NNPS |
Proper noun, plural |
PDT |
Predeterminer |
POS |
Possessive ending |
PRP |
Personal pronoun |
PRP$ |
Possessive pronoun |
RB |
Adverb |
RBR |
Adverb, comparative |
RBS |
Adverb, superlative |
RP |
Particle |
SYM |
Symbol |
TO |
To |
UH |
Interjection |
VB |
Verb, base form |
VBD |
Verb, past tense |
VBG |
Verb, gerund or present participle |
VBN |
Verb, past participle |
VBP |
Verb, non-3rd person singular present |
VBZ |
Verb, 3rd person singular present |
WDT |
Wh-determiner |
WP |
Wh-pronoun |
WP$ |
Possessive wh-pronoun |
WRB |
Wh-adverb |
If POS tagging is performed, then using POS information, named entities can be identified. The tokens tagged with the NNP
tag are Named Entities.
Consider the following example in NLTK in which POS tagging is used to perform NER:
>>> import nltk >>> from nltk import pos_tag, word_tokenize >>> pos_tag(word_tokenize("John and Smith are going to NY and Germany")) [('John', 'NNP'), ('and', 'CC'), ('Smith', 'NNP'), ('are', 'VBP'), ('going', 'VBG'), ('to', 'TO'), ('NY', 'NNP'), ('and', 'CC'), ('Germany', 'NNP')]
Here, the named entities are—John
, Smith
, NY,
and Germany
since they are tagged with the NNP tag.
Let's see another example in which POS tagging is performed in NLTK and the POS tag information is used to detect Named Entities:
>>> import nltk >>> from nltk.corpus import brown >>> from nltk.tag import UnigramTagger >>> tagger = UnigramTagger(brown.tagged_sents(categories='news')[:700]) >>> sentence = ['John','and','Smith','went','to','NY','and','Germany'] >>> for word, tag in tagger.tag(sentence): print(word,'->',tag) John -> NP and -> CC Smith -> None went -> VBD to -> TO NY -> None and -> CC Germany -> None
Here, John has been tagged with the NP
tag, so it is identified as a named entity. Some of the tokens here are tagged with the None
tag because these tokens have not been trained.