官术网_书友最值得收藏!

PoS tagging

We can apply an off-the-shelf tagger from NLTK or combine multiple taggers to customize the tagging process. It is easy to directly use the built-in tagging function, pos_tag, as in: pos_tag(input_tokens), for instance. But behind the scene, it is actually a prediction from a pre-built supervised learning model. The model is trained based on a large corpus composed of words that are correctly tagged.

Reusing an earlier example, we can perform PoS tagging as follows:

>>> import nltk
>>> tokens = word_tokenize(sent)
>>> print(nltk.pos_tag(tokens))
[('I', 'PRP'), ('am', 'VBP'), ('reading', 'VBG'), ('a', 'DT'), ('book', 'NN'), ('.', '.'), ('It', 'PRP'), ('is', 'VBZ'), ('Python', 'NNP'), ('Machine', 'NNP'), ('Learning', 'NNP'), ('By', 'IN'), ('Example', 'NNP'), (',', ','), ('2nd', 'CD'), ('edition', 'NN'), ('.', '.')]

The PoS tag following each token is returned. We can check the meaning of a tag using the help function. Looking up PRP and VBP, for example, gives us the following output:

>>> nltk.help.upenn_tagset('PRP')
PRP: pronoun, personal
hers herself him himself hisself it itself me myself one oneself ours ourselves ownself self she thee theirs them themselves they thou thy us
>>> nltk.help.upenn_tagset('VBP')
VBP: verb, present tense, not 3rd person singular
predominate wrap resort sue twist spill cure lengthen brush terminate appear tend stray glisten obtain comprise detest tease attract emphasize mold postpone sever return wag ...

In spaCy, getting a PoS tag is also easy. The token object parsed from an input sentence has an attribute called pos_, which is the tag we are looking for:

>>> print([(token.text, token.pos_) for token in tokens2])
[('I', 'PRON'), ('have', 'VERB'), ('been', 'VERB'), ('to', 'ADP'), ('U.K.', 'PROPN'), ('and', 'CCONJ'), ('U.S.A.', 'PROPN')]
主站蜘蛛池模板: 乌审旗| 周宁县| 平南县| 海淀区| 米脂县| 曲沃县| 镇雄县| 峨眉山市| 昭苏县| 岳普湖县| 沧州市| 乐山市| 措勤县| 突泉县| 湟源县| 崇义县| 蒙自县| 龙海市| 金湖县| 和平区| 澳门| 太保市| 泸水县| 大港区| 五原县| 托克逊县| 辛集市| 崇信县| 阿克苏市| 阿拉尔市| 绥滨县| 兴化市| 玉环县| 高台县| 汶川县| 阿克苏市| 仁怀市| 长顺县| 郓城县| 东安县| 铁力市|