- Python 3 Text Processing with NLTK 3 Cookbook
- Jacob Perkins
- 663字
- 2021-09-03 09:45:35
Looking up Synsets for a word in WordNet
WordNet is a lexical database for the English language. In other words, it's a dictionary designed specifically for natural language processing.
NLTK comes with a simple interface to look up words in WordNet. What you get is a list of Synset instances, which are groupings of synonymous words that express the same concept. Many words have only one Synset, but some have several. In this recipe, we'll explore a single Synset, and in the next recipe, we'll look at several in more detail.
Getting ready
Be sure you've unzipped the wordnet
corpus at nltk_data/corpora/wordnet
. This will allow WordNetCorpusReader
to access it.
How to do it...
Now we're going to look up the Synset for cookbook
, and explore some of the properties and methods of a Synset using the following code:
>>> from nltk.corpus import wordnet >>> syn = wordnet.synsets('cookbook')[0] >>> syn.name() 'cookbook.n.01' >>> syn.definition() 'a book of recipes and cooking directions'
How it works...
You can look up any word in WordNet using wordnet.synsets(word)
to get a list of Synsets. The list may be empty if the word is not found. The list may also have quite a few elements, as some words can have many possible meanings, and, therefore, many Synsets.
There's more...
Each Synset in the list has a number of methods you can use to learn more about it. The name()
method will give you a unique name for the Synset, which you can use to get the Synset directly:
>>> wordnet.synset('cookbook.n.01') Synset('cookbook.n.01')
The definition()
method should be self-explanatory. Some Synsets also have an examples()
method, which contains a list of phrases that use the word in context:
>>> wordnet.synsets('cooking')[0].examples() ['cooking can be a great art', 'people are needed who have experience in cookery', 'he left the preparation of meals to his wife']
Working with hypernyms
Synsets are organized in a structure similar to that of an inheritance tree. More abstract terms are known as hypernyms and more specific terms are hyponyms. This tree can be traced all the way up to a root hypernym.
Hypernyms provide a way to categorize and group words based on their similarity to each other. The Calculating WordNet Synset similarity recipe details the functions used to calculate the similarity based on the distance between two words in the hypernym tree:
>>> syn.hypernyms() [Synset('reference_book.n.01')] >>> syn.hypernyms()[0].hyponyms() [Synset('annual.n.02'), Synset('atlas.n.02'), Synset('cookbook.n.01'), Synset('directory.n.01'), Synset('encyclopedia.n.01'), Synset('handbook.n.01'), Synset('instruction_book.n.01'), Synset('source_book.n.01'), Synset('wordbook.n.01')] >>> syn.root_hypernyms() [Synset('entity.n.01')]
As you can see, reference_book
is a hypernym of cookbook
, but cookbook
is only one of the many hyponyms of reference_book
. And all these types of books have the same root hypernym, which is entity
, one of the most abstract terms in the English language. You can trace the entire path from entity down to cookbook
using the hypernym_paths()
method, as follows:
>>> syn.hypernym_paths() [[Synset('entity.n.01'), Synset('physical_entity.n.01'), Synset('object.n.01'), Synset('whole.n.02'), Synset('artifact.n.01'), Synset('creation.n.02'), Synset('product.n.02'), Synset('work.n.02'), Synset('publication.n.01'), Synset('book.n.01'), Synset('reference_book.n.01'), Synset('cookbook.n.01')]]
The hypernym_paths()
method returns a list of lists, where each list starts at the root hypernym and ends with the original Synset. Most of the time, you'll only get one nested list of Synsets.
Part of speech (POS)
You can also look up a simplified part-of-speech tag as follows:
>>> syn.pos() 'n'
There are four common part-of-speech tags (or POS tags) found in WordNet, as shown in the following table:

These POS tags can be used to look up specific Synsets for a word. For example, the word 'great'
can be used as a noun or an adjective. In WordNet, 'great'
has 1
noun Synset and 6
adjective Synsets, as shown in the following code:
>>> len(wordnet.synsets('great')) 7 >>> len(wordnet.synsets('great', pos='n')) 1 >>> len(wordnet.synsets('great', pos='a')) 6
These POS tags will be referenced more in the Using WordNet for tagging recipe in Chapter 4, Part-of-speech Tagging.
See also
In the next two recipes, we'll explore lemmas and how to calculate Synset similarity. And in Chapter 2, Replacing and Correcting Words, we'll use WordNet for lemmatization, synonym replacement, and then explore the use of antonyms.
- Rust編程:入門、實(shí)戰(zhàn)與進(jìn)階
- 構(gòu)建移動網(wǎng)站與APP:HTML 5移動開發(fā)入門與實(shí)戰(zhàn)(跨平臺移動開發(fā)叢書)
- Java加密與解密的藝術(shù)
- 編譯系統(tǒng)透視:圖解編譯原理
- RabbitMQ Cookbook
- C語言程序設(shè)計(jì)
- 蘋果的產(chǎn)品設(shè)計(jì)之道:創(chuàng)建優(yōu)秀產(chǎn)品、服務(wù)和用戶體驗(yàn)的七個(gè)原則
- Programming with CodeIgniterMVC
- C++20高級編程
- 時(shí)空數(shù)據(jù)建模及其應(yīng)用
- Instant jQuery Boilerplate for Plugins
- Julia數(shù)據(jù)科學(xué)應(yīng)用
- 深度實(shí)踐KVM:核心技術(shù)、管理運(yùn)維、性能優(yōu)化與項(xiàng)目實(shí)施
- 計(jì)算機(jī)程序的構(gòu)造和解釋(JavaScript版)
- MySQL從入門到精通