官术网_书友最值得收藏!

Summary

In this chapter, we covered common NLP tasks, such as preprocessing and exploratory analysis of text using the NLTK library. The unstructured characteristics of real-world data need extensive preprocessing, such as tokenization, stemming, and stop word removal, to make it suitable for ML. As you saw in the examples, NLTK provides a very extensive API for carrying out these preprocessing steps. It provides built-in packages and modules, and supports flexibility to build custom modules, such as user-defined stemmers and tokenizers.

We also discussed using NLTK for POS tagging, which is another common NLP task, used for issues such as word sense disambiguation and answering questions. Applications such as sentiment classification are widely used for their research and business value. We covered some basic examples of text classification, in the context of sentiment analysis, for tweets and movie reviews, using the NLTK corpora and sklearn. While these can be used in simple NLP applications, more complex text classification, using deep learning, will be explained in subsequent chapters.

主站蜘蛛池模板: 南郑县| 黄大仙区| 个旧市| 会东县| 乐至县| 德保县| 新沂市| 阿巴嘎旗| 股票| 梁河县| 揭东县| 长葛市| 耿马| 阳曲县| 个旧市| 黑龙江省| 锦州市| 绥德县| 南宁市| 丹寨县| 托克托县| 遂宁市| 甘谷县| 明水县| 佛山市| 阜新市| 高密市| 蕉岭县| 洛扎县| 石河子市| 安吉县| 武鸣县| 新乡县| 手机| 西藏| 小金县| 遂平县| 安龙县| 宜都市| 乐陵市| 象山县|