官术网_书友最值得收藏!

Looking up lemmas and synonyms in WordNet

Building on the previous recipe, we can also look up lemmas in WordNet to find synonyms of a word. A lemma (in linguistics), is the canonical form or morphological form of a word.

How to do it...

In the following code, we'll find that there are two lemmas for the cookbook Synset using the lemmas() method:

>>> from nltk.corpus import wordnet
>>> syn = wordnet.synsets('cookbook')[0]
>>> lemmas = syn.lemmas()
>>> len(lemmas)
2
>>> lemmas[0].name()
'cookbook'
>>> lemmas[1].name()
'cookery_book'
>>> lemmas[0].synset() == lemmas[1].synset()
True

How it works...

As you can see, cookery_book and cookbook are two distinct lemmas in the same Synset. In fact, a lemma can only belong to a single Synset. In this way, a Synset represents a group of lemmas that all have the same meaning, while a lemma represents a distinct word form.

There's more...

Since all the lemmas in a Synset have the same meaning, they can be treated as synonyms. So if you wanted to get all synonyms for a Synset, you could do the following:

>>> [lemma.name() for lemma in syn.lemmas()]
['cookbook', 'cookery_book']

All possible synonyms

As mentioned earlier, many words have multiple Synsets because the word can have different meanings depending on the context. But, let's say you didn't care about the context, and wanted to get all the possible synonyms for a word:

>>> synonyms = []
>>> for syn in wordnet.synsets('book'):
...     for lemma in syn.lemmas():
...         synonyms.append(lemma.name())
>>> len(synonyms)
38

As you can see, there appears to be 38 possible synonyms for the word 'book'. But in fact, some synonyms are verb forms, and many synonyms are just different usages of 'book'. If, instead, we take the set of synonyms, there are fewer unique words, as shown in the following code:

>>> len(set(synonyms))
25

Antonyms

Some lemmas also have antonyms. The word good, for example, has 27 Synsets, five of which have lemmas with antonyms, as shown in the following code:

>>> gn2 = wordnet.synset('good.n.02')
>>> gn2.definition()
'moral excellence or admirableness'
>>> evil = gn2.lemmas()[0].antonyms()[0]
>>> evil.name
'evil'
>>> evil.synset().definition()
'the quality of being morally wrong in principle or practice'
>>> ga1 = wordnet.synset('good.a.01')
>>> ga1.definition()
'having desirable or positive qualities especially those suitable for a thing specified'
>>> bad = ga1.lemmas()[0].antonyms()[0]
>>> bad.name()
'bad'
>>> bad.synset().definition()
'having undesirable or negative qualities'

The antonyms() method returns a list of lemmas. In the first case, as we can see in the previous code, the second Synset for good as a noun is defined as moral excellence, and its first antonym is evil, defined as morally wrong. In the second case, when good is used as an adjective to describe positive qualities, the first antonym is bad, which describes negative qualities.

See also

In the next recipe, we'll learn how to calculate Synset similarity. Then in Chapter 2, Replacing and Correcting Words, we'll revisit lemmas for lemmatization, synonym replacement, and antonym replacement.

主站蜘蛛池模板: 磴口县| 安仁县| 海盐县| 台东市| 玉门市| 改则县| 安龙县| 博白县| 赤壁市| 厦门市| 河北省| 克拉玛依市| 柞水县| 资源县| 双城市| 南华县| 呼伦贝尔市| 藁城市| 古田县| 河南省| 和田市| 奉新县| 西安市| 鹤岗市| 云阳县| 澎湖县| 肃南| 巴林左旗| 平江县| 田阳县| 万载县| 杭锦后旗| 博客| 科尔| 玉屏| 玉溪市| 大荔县| 红河县| 湖南省| 临安市| 准格尔旗|