官术网_书友最值得收藏!

Building Your NLP Vocabulary

In the earlier chapters, you were introduced to why Natural Language Processing (NLP) is important especially in today's context, which was followed by a discussion on a few prerequisites and Python libraries that are highly beneficial for NLP tasks. In this chapter, we will take this discussion further and discuss some of the most concrete tasks involved in building a vocabulary for NLP tasks and preprocessing textual data in detail. We will start by learning what a vocabulary is and take the notion forward to actually build a vocabulary. We will do this by applying various methods on text data that are present in most of the NLP pipelines across any organization.

In this chapter, we'll cover the following topics:

  • Lexicons
  • Phonemes, graphemes, and morphemes
  • Tokenization
  • Understanding word normalization
主站蜘蛛池模板: 江孜县| 正宁县| 和龙市| 翼城县| 兴文县| 渭源县| 大兴区| 夏津县| 巩义市| 新建县| 晋中市| 玉田县| 紫云| 宽城| 玛曲县| 固原市| 马关县| 武隆县| 朔州市| 玛纳斯县| 新巴尔虎右旗| 虞城县| 文登市| 沿河| 广水市| 西乡县| 柘荣县| 青川县| 新竹县| 新民市| 夹江县| 左云县| 开封市| 冀州市| 新乡县| 雷山县| 高尔夫| 平顶山市| 溧水县| 囊谦县| 大埔县|