官术网_书友最值得收藏!

Named-entity recognition

Given a text sequence, the named-entity recognition (NER) task is to locate and identify words or phrases that are of definitive categories such as names of persons, companies, locations, and dates. We will briefly mention it again in Chapter 4Detecting Spam Email with Naive Bayes.

As an appetizer, let's take a peep at an example of using spaCy for NER.

First, tokenize an input sentence, The book written by Hayden Liu in 2018 was sold at $30 in America, as usual as shown in the following command:

>>> tokens3 = nlp('The book written by Hayden Liu in 2018 was sold at $30 in America')

The resultant token object contains an attribute called ents, which is the named entities. We can extract the tagging for each recognized named entity as follows:

print([(token_ent.text, token_ent.label_) for token_ent in tokens3.ents])
[('Hayden Liu', 'PERSON'), ('2018', 'DATE'), ('30', 'MONEY'), ('America', 'GPE')]

We can see from the results that Hayden Liu is PERSON, 2018 is DATE, 30 is MONEY, and America is GPE (country). Please refer to https://spacy.io/api/annotation#section-named-entities for a full list of named entity tags.

主站蜘蛛池模板: 金塔县| 扎赉特旗| 绥芬河市| 曲阳县| 永寿县| 常宁市| 健康| 中卫市| 昭觉县| 安庆市| 凤山市| 玉山县| 宁远县| 天镇县| 玉溪市| 昆山市| 兴宁市| 沧州市| 泰来县| 平南县| 库尔勒市| 班戈县| 静乐县| 德化县| 休宁县| 静安区| 平乡县| 开鲁县| 汽车| 汾西县| 特克斯县| 曲水县| 叙永县| 郴州市| 蒙城县| 烟台市| 息烽县| 广饶县| 库车县| 乡城县| 连云港市|