Syntactic matching can be done by performing the task of chunking. In NLTK, a module called nltk.chunk.api
is provided that helps to identify chunks and returns a parse tree for a given chunk sequence.
The module called nltk.chunk.named_entity
is used to identify a list of named entities and also to generate a parse structure. Consider the following code in NLTK based on syntactic matching:
>>> import nltk >>> from nltk.tree import Tree >>> print(Tree(1,[2,Tree(3,[4]),5])) (1 2 (3 4) 5) >>> ct=Tree('VP',[Tree('V',['gave']),Tree('NP',['her'])]) >>> sent=Tree('S',[Tree('NP',['I']),ct]) >>> print(sent) (S (NP I) (VP (V gave) (NP her))) >>> print(sent[1]) (VP (V gave) (NP her)) >>> print(sent[1,1]) (NP her) >>> t1=Tree.from string("(S(NP I) (VP (V gave) (NP her)))") >>> sent==t1 True >>> t1[1][1].set_label('X') >>> t1[1][1].label() 'X' >>> print(t1) (S (NP I) (VP (V gave) (X her))) >>> t1[0],t1[1,1]=t1[1,1],t1[0] >>> print(t1) (S (X her) (VP (V gave) (NP I))) >>> len(t1) 2