官术网_书友最值得收藏!

Summary

In this chapter, you have learned about various types of data and ways to deal with unstructured text data. Text data is usually extremely noisy and needs to be cleaned and preprocessed, which mainly consists of tokenization, stemming, lemmatization, and stop-word removal. After preprocessing, features are extracted from texts using various methods, such as BoW and TFIDF. These methods convert unstructured text data into structured numeric data. New features are created from existing features using a technique called feature engineering. In the last part of this chapter, we explored various ways of visualizing text data, such as word clouds.

In the next chapter, you will learn how to develop machine learning models to classify texts using the feature extraction methods you have learned about in this chapter. Moreover, different sampling techniques and model evaluation parameters will be introduced.

主站蜘蛛池模板: 隆昌县| 浦县| 苗栗县| 棋牌| 呼伦贝尔市| 西安市| 诏安县| 云安县| 合山市| 栾川县| 登封市| 朔州市| 兰溪市| 嘉义县| 遂溪县| 临邑县| 南雄市| 诏安县| 巍山| 和平县| 西平县| 军事| 顺平县| 衢州市| 蓬安县| 荔波县| 锦屏县| 五峰| 额尔古纳市| 睢宁县| 正蓝旗| 会宁县| 开原市| 阳信县| 巴彦淖尔市| 大余县| 潼南县| 景洪市| 中牟县| 正定县| 桐庐县|