CFG was defined for natural languages in 1957 by Noam Chomsky. A CFG consists of the following components:
A→a
CFG rules are of two types—Phrase structure rules and Sentence structure rules.
A Phrase Structure Rule can be defined as follows—A→a, where A Î N and a consists of Terminals and Non terminals.
In Sentence level Construction of CFG, there are four structures:
General CFG rules are summarized here:
Consider an example that depicts the use of Context-free Grammar rules in NLTK:
>>> import nltk >>> from nltk import Nonterminal, nonterminals, Production, CFG >>> nonterminal1 = Nonterminal('NP') >>> nonterminal2 = Nonterminal('VP') >>> nonterminal3 = Nonterminal('PP') >>> nonterminal1.symbol() 'NP' >>> nonterminal2.symbol() 'VP' >>> nonterminal3.symbol() 'PP' >>> nonterminal1==nonterminal2 False >>> nonterminal2==nonterminal3 False >>> nonterminal1==nonterminal3 False >>> S, NP, VP, PP = nonterminals('S, NP, VP, PP') >>> N, V, P, DT = nonterminals('N, V, P, DT') >>> production1 = Production(S, [NP, VP]) >>> production2 = Production(NP, [DT, NP]) >>> production3 = Production(VP, [V, NP,NP,PP]) >>> production1.lhs() S >>> production1.rhs() (NP, VP) >>> production3.lhs() VP >>> production3.rhs() (V, NP, NP, PP) >>> production3 == Production(VP, [V,NP,NP,PP]) True >>> production2 == production3 False
An example for accessing ATIS grammar in NLTK is as follows:
>>> import nltk >>> gram1 = nltk.data.load('grammars/large_grammars/atis.cfg') >>> gram1 <Grammar with 5517 productions>
Extract the testing sentences from ATIS as follows:
>>> import nltk >>> sent = nltk.data.load('grammars/large_grammars/atis_sentences.txt') >>> sent = nltk.parse.util.extract_test_sentences(sent) >>> len(sent) 98 >>> testingsent=sent[25] >>> testingsent[1] 11 >>> testingsent[0] ['list', 'those', 'flights', 'that', 'stop', 'over', 'in', 'salt', 'lake', 'city', '.'] >>> sent=testingsent[0]
Bottom-up parsing:
>>> import nltk >>> gram1 = nltk.data.load('grammars/large_grammars/atis.cfg') >>> sent = nltk.data.load('grammars/large_grammars/atis_sentences.txt') >>> sent = nltk.parse.util.extract_test_sentences(sent) >>> testingsent=sent[25] >>> sent=testingsent[0] >>> parser1 = nltk.parse.BottomUpChartParser(gram1) >>> chart1 = parser1.chart_parse(sent) >>> print((chart1.num_edges())) 13454 >>> print((len(list(chart1.parses(gram1.start()))))) 11
Bottom-up, Left Corner parsing:
>>> import nltk >>> gram1 = nltk.data.load('grammars/large_grammars/atis.cfg') >>> sent = nltk.data.load('grammars/large_grammars/atis_sentences.txt') >>> sent = nltk.parse.util.extract_test_sentences(sent) >>> testingsent=sent[25] >>> sent=testingsent[0] >>> parser2 = nltk.parse.BottomUpLeftCornerChartParser(gram1) >>> chart2 = parser2.chart_parse(sent) >>> print((chart2.num_edges())) 8781 >>> print((len(list(chart2.parses(gram1.start()))))) 11
Left Corner parsing with a Bottom-up filter:
>>> import nltk >>> gram1 = nltk.data.load('grammars/large_grammars/atis.cfg') >>> sent = nltk.data.load('grammars/large_grammars/atis_sentences.txt') >>> sent = nltk.parse.util.extract_test_sentences(sent) >>> testingsent=sent[25] >>> sent=testingsent[0] >>> parser3 = nltk.parse.LeftCornerChartParser(gram1) >>> chart3 = parser3.chart_parse(sent) >>> print((chart3.num_edges())) 1280 >>> print((len(list(chart3.parses(gram1.start()))))) 11
>>> import nltk >>> gram1 = nltk.data.load('grammars/large_grammars/atis.cfg') >>> sent = nltk.data.load('grammars/large_grammars/atis_sentences.txt') >>> sent = nltk.parse.util.extract_test_sentences(sent) >>> testingsent=sent[25] >>> sent=testingsent[0] >>>parser4 = nltk.parse.TopDownChartParser(gram1) >>> chart4 = parser4.chart_parse(sent) >>> print((chart4.num_edges())) 37763 >>> print((len(list(chart4.parses(gram1.start()))))) 11
Incremental Bottom-up parsing:
>>> import nltk >>> gram1 = nltk.data.load('grammars/large_grammars/atis.cfg') >>> sent = nltk.data.load('grammars/large_grammars/atis_sentences.txt') >>> sent = nltk.parse.util.extract_test_sentences(sent) >>> testingsent=sent[25] >>> sent=testingsent[0] >>> parser5 = nltk.parse.IncrementalBottomUpChartParser(gram1) >>> chart5 = parser5.chart_parse(sent) >>> print((chart5.num_edges())) 13454 >>> print((len(list(chart5.parses(gram1.start()))))) 11
Incremental Bottom-up, Left Corner parsing:
>>> import nltk >>> gram1 = nltk.data.load('grammars/large_grammars/atis.cfg') >>> sent = nltk.data.load('grammars/large_grammars/atis_sentences.txt') >>> sent = nltk.parse.util.extract_test_sentences(sent) >>> testingsent=sent[25] >>> sent=testingsent[0] >>> parser6 = nltk.parse.IncrementalBottomUpLeftCornerChartParser(gram1) >>> chart6 = parser6.chart_parse(sent) >>> print((chart6.num_edges())) 8781 >>> print((len(list(chart6.parses(gram1.start()))))) 11
Incremental Left Corner parsing with a Bottom-up filter:
>>> import nltk >>> gram1 = nltk.data.load('grammars/large_grammars/atis.cfg') >>> sent = nltk.data.load('grammars/large_grammars/atis_sentences.txt') >>> sent = nltk.parse.util.extract_test_sentences(sent) >>> testingsent=sent[25] >>> sent=testingsent[0] >>> parser7 = nltk.parse.IncrementalLeftCornerChartParser(gram1) >>> chart7 = parser7.chart_parse(sent) >>> print((chart7.num_edges())) 1280 >>> print((len(list(chart7.parses(gram1.start()))))) 11
Incremental Top-down parsing:
>>> import nltk >>> gram1 = nltk.data.load('grammars/large_grammars/atis.cfg') >>> sent = nltk.data.load('grammars/large_grammars/atis_sentences.txt') >>> sent = nltk.parse.util.extract_test_sentences(sent) >>> testingsent=sent[25] >>> sent=testingsent[0] >>> parser8 = nltk.parse.IncrementalTopDownChartParser(gram1) >>> chart8 = parser8.chart_parse(sent) >>> print((chart8.num_edges())) 37763 >>> print((len(list(chart8.parses(gram1.start()))))) 11
>>> import nltk >>> gram1 = nltk.data.load('grammars/large_grammars/atis.cfg') >>> sent = nltk.data.load('grammars/large_grammars/atis_sentences.txt') >>> sent = nltk.parse.util.extract_test_sentences(sent) >>> testingsent=sent[25] >>> sent=testingsent[0] >>> parser9 = nltk.parse.EarleyChartParser(gram1) >>> chart9 = parser9.chart_parse(sent) >>> print((chart9.num_edges())) 37763 >>> print((len(list(chart9.parses(gram1.start()))))) 11