- Python Machine Learning By Example
- Yuxi (Hayden) Liu
- 271字
- 2021-07-02 12:41:40
PoS tagging
We can apply an off-the-shelf tagger from NLTK or combine multiple taggers to customize the tagging process. It is easy to directly use the built-in tagging function, pos_tag, as in: pos_tag(input_tokens), for instance. But behind the scene, it is actually a prediction from a pre-built supervised learning model. The model is trained based on a large corpus composed of words that are correctly tagged.
Reusing an earlier example, we can perform PoS tagging as follows:
>>> import nltk
>>> tokens = word_tokenize(sent)
>>> print(nltk.pos_tag(tokens))
[('I', 'PRP'), ('am', 'VBP'), ('reading', 'VBG'), ('a', 'DT'), ('book', 'NN'), ('.', '.'), ('It', 'PRP'), ('is', 'VBZ'), ('Python', 'NNP'), ('Machine', 'NNP'), ('Learning', 'NNP'), ('By', 'IN'), ('Example', 'NNP'), (',', ','), ('2nd', 'CD'), ('edition', 'NN'), ('.', '.')]
The PoS tag following each token is returned. We can check the meaning of a tag using the help function. Looking up PRP and VBP, for example, gives us the following output:
>>> nltk.help.upenn_tagset('PRP')
PRP: pronoun, personal
hers herself him himself hisself it itself me myself one oneself ours ourselves ownself self she thee theirs them themselves they thou thy us
>>> nltk.help.upenn_tagset('VBP')
VBP: verb, present tense, not 3rd person singular
predominate wrap resort sue twist spill cure lengthen brush terminate appear tend stray glisten obtain comprise detest tease attract emphasize mold postpone sever return wag ...
In spaCy, getting a PoS tag is also easy. The token object parsed from an input sentence has an attribute called pos_, which is the tag we are looking for:
>>> print([(token.text, token.pos_) for token in tokens2])
[('I', 'PRON'), ('have', 'VERB'), ('been', 'VERB'), ('to', 'ADP'), ('U.K.', 'PROPN'), ('and', 'CCONJ'), ('U.S.A.', 'PROPN')]
- 亮劍.NET:.NET深入體驗與實戰精要
- 繪制進程圖:可視化D++語言(第1冊)
- SCRATCH與機器人
- Linux Mint System Administrator’s Beginner's Guide
- Learning Social Media Analytics with R
- 現代機械運動控制技術
- 具比例時滯遞歸神經網絡的穩定性及其仿真與應用
- Mastering Game Development with Unreal Engine 4(Second Edition)
- 手機游戲程序開發
- Apache源代碼全景分析(第1卷):體系結構與核心模塊
- INSTANT VMware vCloud Starter
- 一步步寫嵌入式操作系統
- Web編程基礎
- 案例解說Delphi典型控制應用
- Hands-On Deep Learning with Go