Let's extract all the verbs which are present in the corpus. In this case, we are using verb tags as VB, VBD, VBG, VBN, VBP, VBZ.
verbs = []
for tag in tagged_wt:
verbs.append([k for k,v in tag if v in ['VB','VBD','VBG','VBN','VBP','VBZ']])
[['extract', 'meaning', 'is', 'analyze'], ['breaking', 'is', 'called', 'are', 'referred'], ['are'], ['has', 'use'], ['is', 'are', 'are', 'are'], ['Using', "'s", 'create', 'counting']]
Now, let's use spacy, to tokenize a piece of text and access the part of speech attribute for each token. As an example application, we’ll tokenize the previous paragraph and count the most common nouns with the following code. We’ll also lemmatize the tokens, which gives the root form a word to help us standardize across forms of a word:
! pip install -q spacy
! pip install -q tabulate
! python -m spacy download en_core_web_lg
from collections import Counter
import spacy
from tabulate import tabulate
nlp = spacy.load('en_core_web_lg')
doc = nlp(text)
noun_counter = Counter(token.lemma_ for token in doc if token.pos_ == 'NOUN')
print(tabulate(noun_counter.most_common(5), headers=['Noun', 'Count']))
Noun Count
----------- -------
step 3
combination 2
text 2
processing 2
datum 2