官术网_书友最值得收藏!

Replacing negations with antonyms

The opposite of synonym replacement is antonym replacement. An antonym is a word that has the opposite meaning of another word. This time, instead of creating custom word mappings, we can use WordNet to replace words with unambiguous antonyms. Refer to the Looking up lemmas and synonyms in WordNet recipe in Chapter 1, Tokenizing Text and WordNet Basics, for more details on antonym lookups.

How to do it...

Let's say you have a sentence like let's not uglify our code. With antonym replacement, you can replace not uglify with beautify, resulting in the sentence let's beautify our code. To do this, we will create an AntonymReplacer class in replacers.py as follows:

from nltk.corpus import wordnet

class AntonymReplacer(object):
  def replace(self, word, pos=None):
    antonyms = set()
    for syn in wordnet.synsets(word, pos=pos):
      for lemma in syn.lemmas():
        for antonym in lemma.antonyms():
          antonyms.add(antonym.name())
    if len(antonyms) == 1:
      return antonyms.pop()
    else:
      return None

  def replace_negations(self, sent):
    i, l = 0, len(sent)
    words = []
    while i < l:
      word = sent[i]
      if word == 'not' and i+1 < l:
        ant = self.replace(sent[i+1])
        if ant:
          words.append(ant)
          i += 2
          continue
      words.append(word)
      i += 1
    return words

Now, we can tokenize the original sentence into ["let's", 'not', 'uglify', 'our', 'code'] and pass this to the replace_negations() function. Here are some examples:

>>> from replacers import AntonymReplacer
>>> replacer = AntonymReplacer()
>>> replacer.replace('good')
>>> replacer.replace('uglify')
'beautify'
>>> sent = ["let's", 'not', 'uglify', 'our', 'code']
>>> replacer.replace_negations(sent)
["let's", 'beautify', 'our', 'code']

How it works...

The AntonymReplacer class has two methods: replace() and replace_negations(). The replace() method takes a single word and an optional part-of-speech tag, then looks up the Synsets for the word in WordNet. Going through all the Synsets and every lemma of each Synset, it creates a set of all antonyms found. If only one antonym is found, then it is an unambiguous replacement. If there is more than one antonym, which can happen quite often, then we don't know for sure which antonym is correct. In the case of multiple antonyms (or no antonyms), replace() returns None as it cannot make a decision.

In replace_negations(), we look through a tokenized sentence for the word not. If not is found, then we try to find an antonym for the next word using replace(). If we find an antonym, then it is appended to the list of words, replacing not and the original word. All other words are appended as is, resulting in a tokenized sentence with unambiguous negations replaced by their antonyms.

There's more...

As unambiguous antonyms aren't very common in WordNet, you might want to create a custom antonym mapping in the same way we did for synonyms. This AntonymWordReplacer can be constructed by inheriting from both WordReplacer and AntonymReplacer:

class AntonymWordReplacer(WordReplacer, AntonymReplacer):
  pass

The order of inheritance is very important, as we want the initialization and replace function of WordReplacer combined with the replace_negations function from AntonymReplacer. The result is a replacer that can perform the following:

>>> from replacers import AntonymWordReplacer
>>> replacer = AntonymWordReplacer({'evil': 'good'})
>>> replacer.replace_negations(['good', 'is', 'not', 'evil'])
['good', 'is', 'good']

Of course, you can also inherit from CsvWordReplacer or YamlWordReplacer instead of WordReplacer if you want to load the antonym word mappings from a file.

See also

The previous recipe covers the WordReplacer from the perspective of synonym replacement. In Chapter 1, Tokenizing Text and WordNet Basics, WordNet usage is covered in detail in the Looking up Synsets for a word in WordNet and Looking up lemmas and synonyms in WordNet recipes.

主站蜘蛛池模板: 临沂市| 宿州市| 苏州市| 邓州市| 鱼台县| 阜阳市| 元江| 通渭县| 通州市| 贡嘎县| 成武县| 正宁县| 万盛区| 汾西县| 黑水县| SHOW| 肥城市| 苍溪县| 通榆县| 兴山县| 班戈县| 开封市| 冷水江市| 夹江县| 曲水县| 舟曲县| 望城县| 昆明市| 巴楚县| 思茅市| 永德县| 汕头市| 龙海市| 垫江县| 宁阳县| 双鸭山市| 资兴市| 新野县| 启东市| 庐江县| 和平区|