官术网_书友最值得收藏!

Named-entity recognition

Given a text sequence, the named-entity recognition (NER) task is to locate and identify words or phrases that are of definitive categories such as names of persons, companies, locations, and dates. We will briefly mention it again in Chapter 4Detecting Spam Email with Naive Bayes.

As an appetizer, let's take a peep at an example of using spaCy for NER.

First, tokenize an input sentence, The book written by Hayden Liu in 2018 was sold at $30 in America, as usual as shown in the following command:

>>> tokens3 = nlp('The book written by Hayden Liu in 2018 was sold at $30 in America')

The resultant token object contains an attribute called ents, which is the named entities. We can extract the tagging for each recognized named entity as follows:

print([(token_ent.text, token_ent.label_) for token_ent in tokens3.ents])
[('Hayden Liu', 'PERSON'), ('2018', 'DATE'), ('30', 'MONEY'), ('America', 'GPE')]

We can see from the results that Hayden Liu is PERSON, 2018 is DATE, 30 is MONEY, and America is GPE (country). Please refer to https://spacy.io/api/annotation#section-named-entities for a full list of named entity tags.

主站蜘蛛池模板: 和林格尔县| 甘德县| 安康市| 苍梧县| 洛阳市| 平江县| 文成县| 肇州县| 乌恰县| 彝良县| 泉州市| 甘孜| 云阳县| 固始县| 金寨县| 四子王旗| 神池县| 石嘴山市| 佛山市| 轮台县| 固阳县| 靖安县| 兴义市| 太仆寺旗| 通辽市| 平顺县| 上蔡县| 高邮市| 开平市| 枣庄市| 巴塘县| 从化市| 张北县| 卢湾区| 涿州市| 大港区| 开鲁县| 宝鸡市| 冕宁县| 桃源县| 巴林右旗|