Extracting Verbs

Let's extract all the verbs which are present in the corpus. In this case, we are using verb tags as VB, VBD, VBG, VBN, VBP, VBZ.


verbs = [] 
for tag in tagged_wt:
verbs.append([k for k,v in tag if v in ['VB','VBD','VBG','VBN','VBP','VBZ']])

[['extract', 'meaning', 'is', 'analyze'], ['breaking', 'is', 'called', 'are', 'referred'], ['are'], ['has', 'use'], ['is', 'are', 'are', 'are'], ['Using', "'s", 'create', 'counting']]

Now, let's use spacy, to tokenize a piece of text and access the part of speech attribute for each token. As an example application, we’ll tokenize the previous paragraph and count the most common nouns with the following code. We’ll also lemmatize the tokens, which gives the root form a word to help us standardize across forms of a word:

! pip install -q spacy 
! pip install -q tabulate
! python -m spacy download en_core_web_lg



from collections import Counter
import spacy
from tabulate import tabulate
nlp = spacy.load('en_core_web_lg')

doc = nlp(text)
noun_counter = Counter(token.lemma_ for token in doc if token.pos_ == 'NOUN')

print(tabulate(noun_counter.most_common(5), headers=['Noun', 'Count']))

Noun         Count 
-----------  ------- 
step          3 
combination   2 
text          2 
processing    2 
datum         2

Table of Contents for Extracting Verbs

Create new playlist

Sign In

Sign Up

Table of Contents for
Extracting Verbs