官术网_书友最值得收藏!

Introduction

In the previous chapter, we learned about the concepts of Natural Language Processing (NLP) and text analytics. We also took a quick look at various preprocessing steps. In this chapter, we will learn how to make text understandable to machine learning algorithms.

As we know, to use a machine learning algorithm on textual data, we need a numerical or vector representation of text data since most of these algorithms are unable to work directly with plain text or strings. But before converting the text data into numerical form, we will need to pass it through some preprocessing steps such as tokenization, stemming, lemmatization, and stop-word removal.

So, in this chapter, we will learn a little bit more about these preprocessing steps and how to extract features from the preprocessed text and convert them into vectors. We will also explore two popular methods for feature extraction (Bag of Words and Term Frequency-Inverse Document Frequency), as well as various methods for finding similarity between different texts. By the end of this chapter, you will have gained an in-depth understanding of how text data can be visualized.

主站蜘蛛池模板: 娄烦县| 武宁县| 商南县| 西昌市| 昌邑市| 平邑县| 玉山县| 乐亭县| 盐亭县| 滨州市| 正蓝旗| 杂多县| 平山县| 万载县| 台东县| 平邑县| 桓台县| 绥江县| 旌德县| 桦甸市| 长垣县| 利辛县| 余干县| 石渠县| 桦甸市| 永康市| 措美县| 依兰县| 故城县| 平罗县| 南靖县| 天门市| 浙江省| 扎兰屯市| 成安县| 宜丰县| 昌图县| 鹰潭市| 金华市| 满城县| 吕梁市|