官术网_书友最值得收藏!

Introduction

In the previous chapter, we learned about the concepts of Natural Language Processing (NLP) and text analytics. We also looked at various pre-processing steps in brief. In this chapter, we will learn how to deal with text data whose formats are mostly unstructured. Unstructured data cannot be represented in a tabular format. Therefore, it is essential to convert it into numeric features because most machine learning algorithms are capable of dealing only with numbers. More emphasis will be put on steps such as tokenization, stemming, lemmatization, and stop-word removal. You will also learn about two popular methods for feature extraction: bag of words and Term Frequency-Inverse Document Frequency, as well as various methods for creating new features from existing features. Finally, you will become familiar with how text data can be visualized.

主站蜘蛛池模板: 柏乡县| 额尔古纳市| 类乌齐县| 冀州市| 渝中区| 仲巴县| 太白县| 鄂托克旗| 理塘县| 五家渠市| 于都县| 深圳市| 平山县| 枝江市| 武宣县| 福海县| 交口县| 凤庆县| 英吉沙县| 秦皇岛市| 兴城市| 安阳县| 绵竹市| 涞源县| 仪征市| 舟山市| 连江县| 潜山县| 乡宁县| 丰宁| 和田市| 溆浦县| 阜平县| 皮山县| 得荣县| 福建省| 兴海县| 阆中市| 林甸县| 久治县| 万载县|