官术网_书友最值得收藏!

  • The Natural Language Processing Workshop
  • Rohan Chopra Aniruddha M. Godbole Nipun Sadvilkar Muzaffar Bashir Shah Sohom Ghosh Dwight Gunning
  • 185字
  • 2021-06-11 18:39:24

Introduction

In the previous chapter, we learned about the concepts of Natural Language Processing (NLP) and text analytics. We also took a quick look at various preprocessing steps. In this chapter, we will learn how to make text understandable to machine learning algorithms.

As we know, to use a machine learning algorithm on textual data, we need a numerical or vector representation of text data since most of these algorithms are unable to work directly with plain text or strings. But before converting the text data into numerical form, we will need to pass it through some preprocessing steps such as tokenization, stemming, lemmatization, and stop-word removal.

So, in this chapter, we will learn a little bit more about these preprocessing steps and how to extract features from the preprocessed text and convert them into vectors. We will also explore two popular methods for feature extraction (Bag of Words and Term Frequency-Inverse Document Frequency), as well as various methods for finding similarity between different texts. By the end of this chapter, you will have gained an in-depth understanding of how text data can be visualized.

主站蜘蛛池模板: 株洲市| 金乡县| 玛沁县| 桂平市| 隆德县| 鲜城| 三明市| 民权县| 布拖县| 南澳县| 红安县| 新余市| 洪湖市| 宁阳县| 金乡县| 公主岭市| 新蔡县| 九龙坡区| 湘阴县| 集贤县| 井陉县| 镇雄县| 施秉县| 如东县| 通辽市| 吐鲁番市| 陆良县| 彰武县| 大姚县| 句容市| 甘德县| 余姚市| 镇江市| 安吉县| 松潘县| 寿光市| 分宜县| 建德市| 磴口县| 吉木乃县| 平凉市|