The following is a table of all the part-of-speech tags that occur in the treebank
corpus distributed with NLTK. The tags and counts shown here were acquired using the following code:
>>> from nltk.probability import FreqDist >>> from nltk.corpus import treebank >>> fd = FreqDist() >>> for word, tag in treebank.tagged_words(): ... fd[tag] += 1 >>> fd.items()
The FreqDist fd
contains all the counts shown here for every tag in the treebank
corpus. You can inspect each tag count individually, by doing fd[tag]
, for example, fd['DT']
. Punctuation tags are also shown, along with special tags such as -NONE-
, which signifies that the part-of-speech tag is unknown. Descriptions of most of the tags can be found at the following link:
http://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html
Part-of-speech tag |
Frequency of occurrence |
---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|