- Python Machine Learning By Example
- Yuxi (Hayden) Liu
- 271字
- 2021-07-02 12:41:40
PoS tagging
We can apply an off-the-shelf tagger from NLTK or combine multiple taggers to customize the tagging process. It is easy to directly use the built-in tagging function, pos_tag, as in: pos_tag(input_tokens), for instance. But behind the scene, it is actually a prediction from a pre-built supervised learning model. The model is trained based on a large corpus composed of words that are correctly tagged.
Reusing an earlier example, we can perform PoS tagging as follows:
>>> import nltk
>>> tokens = word_tokenize(sent)
>>> print(nltk.pos_tag(tokens))
[('I', 'PRP'), ('am', 'VBP'), ('reading', 'VBG'), ('a', 'DT'), ('book', 'NN'), ('.', '.'), ('It', 'PRP'), ('is', 'VBZ'), ('Python', 'NNP'), ('Machine', 'NNP'), ('Learning', 'NNP'), ('By', 'IN'), ('Example', 'NNP'), (',', ','), ('2nd', 'CD'), ('edition', 'NN'), ('.', '.')]
The PoS tag following each token is returned. We can check the meaning of a tag using the help function. Looking up PRP and VBP, for example, gives us the following output:
>>> nltk.help.upenn_tagset('PRP')
PRP: pronoun, personal
hers herself him himself hisself it itself me myself one oneself ours ourselves ownself self she thee theirs them themselves they thou thy us
>>> nltk.help.upenn_tagset('VBP')
VBP: verb, present tense, not 3rd person singular
predominate wrap resort sue twist spill cure lengthen brush terminate appear tend stray glisten obtain comprise detest tease attract emphasize mold postpone sever return wag ...
In spaCy, getting a PoS tag is also easy. The token object parsed from an input sentence has an attribute called pos_, which is the tag we are looking for:
>>> print([(token.text, token.pos_) for token in tokens2])
[('I', 'PRON'), ('have', 'VERB'), ('been', 'VERB'), ('to', 'ADP'), ('U.K.', 'PROPN'), ('and', 'CCONJ'), ('U.S.A.', 'PROPN')]
- Mastering Hadoop 3
- 控制與決策系統仿真
- ServiceNow Cookbook
- VMware Performance and Capacity Management(Second Edition)
- 樂高創意機器人教程(中級 下冊 10~16歲) (青少年iCAN+創新創意實踐指導叢書)
- Pig Design Patterns
- 西門子S7-200 SMART PLC實例指導學與用
- 運動控制器與交流伺服系統的調試和應用
- Red Hat Linux 9實務自學手冊
- 激光選區熔化3D打印技術
- 水晶石影視動畫精粹:After Effects & Nuke 影視后期合成
- 寒江獨釣:Windows內核安全編程
- WOW!Photoshop CS6完全自學寶典
- FreeCAD [How-to]
- Hands-On Microservices with C#