官术网_书友最值得收藏!

Applications of POS tagging

POS tagging finds applications in Named Entity Recognition (NER), sentiment analysis, question answering, and word sense disambiguation. We will look at an example of word sense disambiguation in the following code. In the sentences I left the room and Left of the room, the word left conveys different meanings. A POS tagger would help to differentiate between the two meanings of the word left. We will now look at how these two different usages of the same word are tagged:

>>> import nltk
>>> text1 = nltk.word_tokenize("I left the room")
>>> text2 = nltk.word_tokenize("Left of the room")
>>> nltk.pos_tag(text1,tagset='universal')
[('I', 'PRON'), ('left', 'VERB'), ('the', 'DET'), ('room', 'NOUN')]
>>> nltk.pos_tag(text2, tagset='universal')
[('Left', 'NOUN'), ('of', 'ADP'), ('the', 'DET'), ('room', 'NOUN')]

In the first example, the word left is a verb, whereas it is a noun in the second example. In NER, POS tagging helps in identifying a person, place, or location, based on the tags. NLTK provides a built-in trained classifier that can identify entities in the text, which works on top of the POS tagged sentences, as shown in the following code:

>>> import nltk
>>> example_sent = nltk.word_tokenize("The company is located in South Africa")
>>> example_sent
['The', 'company', 'is', 'located', 'in', 'South', 'Africa']
>>> tagged_sent = nltk.pos_tag(example_sent)
>>> tagged_sent
[('The', 'DT'), ('company', 'NN'), ('is', 'VBZ'), ('located', 'VBN'), ('in', 'IN'), ('South', 'NNP'), ('Africa', 'NNP')]
>>> nltk.ne_chunk(tagged_sent)
Tree('S', [('The', 'DT'), ('company', 'NN'), ('is', 'VBZ'), ('located', 'VBN'), ('in', 'IN'), Tree('GPE', [('South', 'NNP'), ('Africa', 'NNP')])])

The ne_chunk() function uses the trained named entity chunker to identify South Africa as a geopolitical entity (GPE), in the example sentence. So far, we have seen examples using NLTK's built-in taggers. In the next section, we will look at how to develop our own POS tagger.

主站蜘蛛池模板: 丹寨县| 泌阳县| 阿城市| 滨海县| 临西县| 江都市| 绿春县| 临猗县| 沁源县| 崇明县| 十堰市| 临漳县| 高雄县| 德化县| 洞头县| 登封市| 鸡泽县| 达孜县| 江口县| 江口县| 化州市| 炉霍县| 喀什市| 那坡县| 敖汉旗| 鄂温| 万州区| 禹城市| 平潭县| 兴国县| 济阳县| 磐安县| 浑源县| 嘉黎县| 高平市| 雷州市| 天镇县| 吉安市| 舞钢市| 德令哈市| 象州县|