官术网_书友最值得收藏!

Named-entity recognition

Given a text sequence, the named-entity recognition (NER) task is to locate and identify words or phrases that are of definitive categories such as names of persons, companies, locations, and dates. We will briefly mention it again in Chapter 4Detecting Spam Email with Naive Bayes.

As an appetizer, let's take a peep at an example of using spaCy for NER.

First, tokenize an input sentence, The book written by Hayden Liu in 2018 was sold at $30 in America, as usual as shown in the following command:

>>> tokens3 = nlp('The book written by Hayden Liu in 2018 was sold at $30 in America')

The resultant token object contains an attribute called ents, which is the named entities. We can extract the tagging for each recognized named entity as follows:

print([(token_ent.text, token_ent.label_) for token_ent in tokens3.ents])
[('Hayden Liu', 'PERSON'), ('2018', 'DATE'), ('30', 'MONEY'), ('America', 'GPE')]

We can see from the results that Hayden Liu is PERSON, 2018 is DATE, 30 is MONEY, and America is GPE (country). Please refer to https://spacy.io/api/annotation#section-named-entities for a full list of named entity tags.

主站蜘蛛池模板: 察雅县| 昌乐县| 宣城市| 江山市| 石屏县| 正镶白旗| 平潭县| 横峰县| 东源县| 鄢陵县| 武穴市| 北碚区| 兴城市| 石嘴山市| 祁连县| 遵义市| 湟中县| 桂东县| 林口县| 梓潼县| 灵石县| 泉州市| 阳信县| 吴川市| 安化县| 扶沟县| 桃江县| 连云港市| 若羌县| 嘉义县| 建阳市| 阿拉善右旗| 东兴市| 盐亭县| 崇州市| 乳山市| 石阡县| 东安县| 吉林省| 沿河| 鲁山县|