Earley chart parsing algorithm

Earley algorithm was given by Earley in 1970. This algorithm is similar to Top-down parsing. It can handle left-recursion, and it doesn't need CNF. It fills in a chart in the left to right manner.

Consider an example that illustrates parsing using the Earley chart parser:

>>> import nltk
>>> nltk.parse.earleychart.demo(print_times=False, trace=1,sent='I saw a dog', numparses=2)
* Sentence:
I saw a dog
['I', 'saw', 'a', 'dog']

|.    I    .   saw   .    a    .   dog   .|
|[---------]         .         .         .| [0:1] 'I'
|.         [---------]         .         .| [1:2] 'saw'
|.         .         [---------]         .| [2:3] 'a'
|.         .         .         [---------]| [3:4] 'dog'
|>         .         .         .         .| [0:0] S  -> * NP VP
|>         .         .         .         .| [0:0] NP -> * NP PP
|>         .         .         .         .| [0:0] NP -> * Det Noun
|>         .         .         .         .| [0:0] NP -> * 'I'
|[---------]         .         .         .| [0:1] NP -> 'I' *
|[--------->         .         .         .| [0:1] S  -> NP * VP
|[--------->         .         .         .| [0:1] NP -> NP * PP
|.         >         .         .         .| [1:1] VP -> * VP PP
|.         >         .         .         .| [1:1] VP -> * Verb NP
|.         >         .         .         .| [1:1] VP -> * Verb
|.         >         .         .         .| [1:1] Verb -> * 'saw'
|.         [---------]         .         .| [1:2] Verb -> 'saw' *
|.         [--------->         .         .| [1:2] VP -> Verb * NP
|.         [---------]         .         .| [1:2] VP -> Verb *
|[-------------------]         .         .| [0:2] S  -> NP VP *
|.         [--------->         .         .| [1:2] VP -> VP * PP
|.         .         >         .         .| [2:2] NP -> * NP PP
|.         .         >         .         .| [2:2] NP -> * Det Noun
|.         .         >         .         .| [2:2] Det -> * 'a'
|.         .         [---------]         .| [2:3] Det -> 'a' *
|.         .         [--------->         .| [2:3] NP -> Det * Noun
|.         .         .         >         .| [3:3] Noun -> * 'dog'
|.         .         .         [---------]| [3:4] Noun -> 'dog' *
|.         .         [-------------------]| [2:4] NP -> Det Noun *
|.         [-----------------------------]| [1:4] VP -> Verb NP *
|.         .         [------------------->| [2:4] NP -> NP * PP
|[=======================================]| [0:4] S  -> NP VP *
|.         [----------------------------->| [1:4] VP -> VP * PP

Consider an example that illustrates parsing using the Chart parser in NLTK:

>>> import nltk
>>> nltk.parse.chart.demo(2, print_times=False, trace=1,sent='John saw a dog', numparses=1)
* Sentence:
John saw a dog
['John', 'saw', 'a', 'dog']

* Strategy: Bottom-up

|.   John  .   saw   .    a    .   dog   .|
|[---------]         .         .         .| [0:1] 'John'
|.         [---------]         .         .| [1:2] 'saw'
|.         .         [---------]         .| [2:3] 'a'
|.         .         .         [---------]| [3:4] 'dog'
|>         .         .         .         .| [0:0] NP -> * 'John'
|[---------]         .         .         .| [0:1] NP -> 'John' *
|>         .         .         .         .| [0:0] S  -> * NP VP
|>         .         .         .         .| [0:0] NP -> * NP PP
|[--------->         .         .         .| [0:1] S  -> NP * VP
|[--------->         .         .         .| [0:1] NP -> NP * PP
|.         >         .         .         .| [1:1] Verb -> * 'saw'
|.         [---------]         .         .| [1:2] Verb -> 'saw' *
|.         >         .         .         .| [1:1] VP -> * Verb NP
|.         >         .         .         .| [1:1] VP -> * Verb
|.         [--------->         .         .| [1:2] VP -> Verb * NP
|.         [---------]         .         .| [1:2] VP -> Verb *
|.         >         .         .         .| [1:1] VP -> * VP PP
|[-------------------]         .         .| [0:2] S  -> NP VP *
|.         [--------->         .         .| [1:2] VP -> VP * PP
|.         .         >         .         .| [2:2] Det -> * 'a'
|.         .         [---------]         .| [2:3] Det -> 'a' *
|.         .         >         .         .| [2:2] NP -> * Det Noun
|.         .         [--------->         .| [2:3] NP -> Det * Noun
|.         .         .         >         .| [3:3] Noun -> * 'dog'
|.         .         .         [---------]| [3:4] Noun -> 'dog' *
|.         .         [-------------------]| [2:4] NP -> Det Noun *
|.         .         >         .         .| [2:2] S  -> * NP VP
|.         .         >         .         .| [2:2] NP -> * NP PP
|.         [-----------------------------]| [1:4] VP -> Verb NP *
|.         .         [------------------->| [2:4] S  -> NP * VP
|.         .         [------------------->| [2:4] NP -> NP * PP
|[=======================================]| [0:4] S  -> NP VP *
|.         [----------------------------->| [1:4] VP -> VP * PP
Nr edges in chart: 33
(S (NP John) (VP (Verb saw) (NP (Det a) (Noun dog))))

Consider an example that illustrates parsing using the Stepping Chart parser in NLTK:

>>> import nltk
>>> nltk.parse.chart.demo(5, print_times=False, trace=1,sent='John saw a dog', numparses=2)
* Sentence:
John saw a dog
['John', 'saw', 'a', 'dog']

* Strategy: Stepping (top-down vs bottom-up)

*** SWITCH TO TOP DOWN
|[---------]         .         .         .| [0:1] 'John'
|.         [---------]         .         .| [1:2] 'saw'
|.         .         [---------]         .| [2:3] 'a'
|.         .         .         [---------]| [3:4] 'dog'
|>         .         .         .         .| [0:0] S  -> * NP VP
|>         .         .         .         .| [0:0] NP -> * NP PP
|>         .         .         .         .| [0:0] NP -> * Det Noun
|>         .         .         .         .| [0:0] NP -> * 'John'
|[---------]         .         .         .| [0:1] NP -> 'John' *
|[--------->         .         .         .| [0:1] S  -> NP * VP
|[--------->         .         .         .| [0:1] NP -> NP * PP
|.         >         .         .         .| [1:1] VP -> * VP PP
|.         >         .         .         .| [1:1] VP -> * Verb NP
|.         >         .         .         .| [1:1] VP -> * Verb
|.         >         .         .         .| [1:1] Verb -> * 'saw'
|.         [---------]         .         .| [1:2] Verb -> 'saw' *
|.         [--------->         .         .| [1:2] VP -> Verb * NP
|.         [---------]         .         .| [1:2] VP -> Verb *
|[-------------------]         .         .| [0:2] S  -> NP VP *
|.         [--------->         .         .| [1:2] VP -> VP * PP
|.         .         >         .         .| [2:2] NP -> * NP PP
|.         .         >         .         .| [2:2] NP -> * Det Noun
*** SWITCH TO BOTTOM UP
|.         .         >         .         .| [2:2] Det -> * 'a'
|.         .         .         >         .| [3:3] Noun -> * 'dog'
|.         .         [---------]         .| [2:3] Det -> 'a' *
|.         .         .         [---------]| [3:4] Noun -> 'dog' *
|.         .         [--------->         .| [2:3] NP -> Det * Noun
|.         .         [-------------------]| [2:4] NP -> Det Noun *
|.         [-----------------------------]| [1:4] VP -> Verb NP *
|.         .         [------------------->| [2:4] NP -> NP * PP
|[=======================================]| [0:4] S  -> NP VP *
|.         [----------------------------->| [1:4] VP -> VP * PP
|.         .         >         .         .| [2:2] S  -> * NP VP
|.         .         [------------------->| [2:4] S  -> NP * VP
*** SWITCH TO TOP DOWN
|.         .         .         .         >| [4:4] VP -> * VP PP
|.         .         .         .         >| [4:4] VP -> * Verb NP
|.         .         .         .         >| [4:4] VP -> * Verb
*** SWITCH TO BOTTOM UP
*** SWITCH TO TOP DOWN
*** SWITCH TO BOTTOM UP
*** SWITCH TO TOP DOWN
*** SWITCH TO BOTTOM UP
*** SWITCH TO TOP DOWN
*** SWITCH TO BOTTOM UP
Nr edges in chart: 37

Let's see the code for Feature chart parsing in NLTK:

>>> import nltk
>>>nltk.parse.featurechart.demo(print_times=False,print_grammar=True,parser=nltk.parse.featurechart.FeatureChartParser,sent='I saw a dog')

Grammar with 18 productions (start state = S[])
    S[] -> NP[] VP[]
    PP[] -> Prep[] NP[]
    NP[] -> NP[] PP[]
    VP[] -> VP[] PP[]
    VP[] -> Verb[] NP[]
    VP[] -> Verb[]
    NP[] -> Det[pl=?x] Noun[pl=?x]
    NP[] -> 'John'
    NP[] -> 'I'
    Det[] -> 'the'
    Det[] -> 'my'
    Det[-pl] -> 'a'
    Noun[-pl] -> 'dog'
    Noun[-pl] -> 'cookie'
    Verb[] -> 'ate'
    Verb[] -> 'saw'
    Prep[] -> 'with'
    Prep[] -> 'under'

* FeatureChartParser
Sentence: I saw a dog
|. I .saw. a .dog.|
|[---]   .   .   .| [0:1] 'I'
|.   [---]   .   .| [1:2] 'saw'
|.   .   [---]   .| [2:3] 'a'
|.   .   .   [---]| [3:4] 'dog'
|[---]   .   .   .| [0:1] NP[] -> 'I' *
|[--->   .   .   .| [0:1] S[] -> NP[] * VP[] {}
|[--->   .   .   .| [0:1] NP[] -> NP[] * PP[] {}
|.   [---]   .   .| [1:2] Verb[] -> 'saw' *
|.   [--->   .   .| [1:2] VP[] -> Verb[] * NP[] {}
|.   [---]   .   .| [1:2] VP[] -> Verb[] *
|.   [--->   .   .| [1:2] VP[] -> VP[] * PP[] {}
|[-------]   .   .| [0:2] S[] -> NP[] VP[] *
|.   .   [---]   .| [2:3] Det[-pl] -> 'a' *
|.   .   [--->   .| [2:3] NP[] -> Det[pl=?x] * Noun[pl=?x] {?x: False}
|.   .   .   [---]| [3:4] Noun[-pl] -> 'dog' *
|.   .   [-------]| [2:4] NP[] -> Det[-pl] Noun[-pl] *
|.   .   [------->| [2:4] S[] -> NP[] * VP[] {}
|.   .   [------->| [2:4] NP[] -> NP[] * PP[] {}
|.   [-----------]| [1:4] VP[] -> Verb[] NP[] *
|.   [----------->| [1:4] VP[] -> VP[] * PP[] {}
|[===============]| [0:4] S[] -> NP[] VP[] *
(S[]
  (NP[] I)
  (VP[] (Verb[] saw) (NP[] (Det[-pl] a) (Noun[-pl] dog))))

The following code is found in NLTK for the implementation of the Earley algorithm:

def demo(print_times=True, print_grammar=False,
         print_trees=True, trace=2,
         sent='I saw John with a dog with my cookie', numparses=5):
    """
    A demonstration of the Earley parsers.
    """
    import sys, time
    from nltk.parse.chart import demo_grammar

    # The grammar for ChartParser and SteppingChartParser:
    grammar = demo_grammar()
    if print_grammar:
        print("* Grammar")
        print(grammar)

    # Tokenize the sample sentence.
    print("* Sentence:")
    print(sent)
    tokens = sent.split()
    print(tokens)
    print()

    # Do the parsing.
    earley = EarleyChartParser(grammar, trace=trace)
    t = time.clock()
    chart = earley.chart_parse(tokens)
    parses = list(chart.parses(grammar.start()))
    t = time.clock()-t

    # Print results.
    if numparses:
        assert len(parses)==numparses, 'Not all parses found'
    if print_trees:
        for tree in parses: print(tree)
    else:
        print("Nr trees:", len(parses))
    if print_times:
        print("Time:", t)

if __name__ == '__main__': demo()
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset