官术网_书友最值得收藏!

Applications of POS tagging

POS tagging finds applications in Named Entity Recognition (NER), sentiment analysis, question answering, and word sense disambiguation. We will look at an example of word sense disambiguation in the following code. In the sentences I left the room and Left of the room, the word left conveys different meanings. A POS tagger would help to differentiate between the two meanings of the word left. We will now look at how these two different usages of the same word are tagged:

>>> import nltk
>>> text1 = nltk.word_tokenize("I left the room")
>>> text2 = nltk.word_tokenize("Left of the room")
>>> nltk.pos_tag(text1,tagset='universal')
[('I', 'PRON'), ('left', 'VERB'), ('the', 'DET'), ('room', 'NOUN')]
>>> nltk.pos_tag(text2, tagset='universal')
[('Left', 'NOUN'), ('of', 'ADP'), ('the', 'DET'), ('room', 'NOUN')]

In the first example, the word left is a verb, whereas it is a noun in the second example. In NER, POS tagging helps in identifying a person, place, or location, based on the tags. NLTK provides a built-in trained classifier that can identify entities in the text, which works on top of the POS tagged sentences, as shown in the following code:

>>> import nltk
>>> example_sent = nltk.word_tokenize("The company is located in South Africa")
>>> example_sent
['The', 'company', 'is', 'located', 'in', 'South', 'Africa']
>>> tagged_sent = nltk.pos_tag(example_sent)
>>> tagged_sent
[('The', 'DT'), ('company', 'NN'), ('is', 'VBZ'), ('located', 'VBN'), ('in', 'IN'), ('South', 'NNP'), ('Africa', 'NNP')]
>>> nltk.ne_chunk(tagged_sent)
Tree('S', [('The', 'DT'), ('company', 'NN'), ('is', 'VBZ'), ('located', 'VBN'), ('in', 'IN'), Tree('GPE', [('South', 'NNP'), ('Africa', 'NNP')])])

The ne_chunk() function uses the trained named entity chunker to identify South Africa as a geopolitical entity (GPE), in the example sentence. So far, we have seen examples using NLTK's built-in taggers. In the next section, we will look at how to develop our own POS tagger.

主站蜘蛛池模板: 哈尔滨市| 汉源县| 昭觉县| 大同市| 大余县| 沛县| 龙胜| 栾川县| 邯郸县| 巧家县| 收藏| 留坝县| 汾阳市| 铜鼓县| 达日县| 潍坊市| 思南县| 怀柔区| 安溪县| 临猗县| 山西省| 介休市| 广丰县| 武汉市| 昆明市| 图木舒克市| 田东县| 深水埗区| 大化| 商洛市| 嘉峪关市| 莎车县| 大厂| 万源市| 延庆县| 通江县| 舞钢市| 屯留县| 延边| 通许县| 木兰县|