官术网_书友最值得收藏!

Summary

In this chapter, you have learned about various types of data and ways to deal with unstructured text data. Text data is usually untidy and needs to be cleaned and pre-processed. Pre-processing steps mainly consist of tokenization, stemming, lemmatization, and stop-word removal. After pre-processing, features are extracted from texts using various methods, such as BoW and TF-IDF. This step converts unstructured text data into structured numeric data. New features are created from existing features using a technique called feature engineering. In the last part of the chapter, we explored various ways of visualizing text data, such as word clouds.

In the next chapter, you will learn how to develop machine learning models to classify texts using the features you have learned to extract in this chapter. Moreover, different sampling techniques and model evaluation parameters will be introduced.

主站蜘蛛池模板: 察哈| 沈丘县| 安康市| 荆门市| 陆川县| 西林县| 博兴县| 郸城县| 九江县| 成都市| 株洲市| 宁武县| 杭锦后旗| 绥德县| 郯城县| 洛阳市| 东港市| 板桥市| 万宁市| 宿松县| 盐边县| 定西市| 高邮市| 远安县| 垣曲县| 龙井市| 彩票| 宁化县| 平远县| 揭西县| 和静县| 台北县| 宜兰市| 萨嘎县| 台中县| 石屏县| 浦县| 湘潭县| 左权县| 张家川| 松潘县|