官术网_书友最值得收藏!

Summary

In this chapter, we covered common NLP tasks, such as preprocessing and exploratory analysis of text using the NLTK library. The unstructured characteristics of real-world data need extensive preprocessing, such as tokenization, stemming, and stop word removal, to make it suitable for ML. As you saw in the examples, NLTK provides a very extensive API for carrying out these preprocessing steps. It provides built-in packages and modules, and supports flexibility to build custom modules, such as user-defined stemmers and tokenizers.

We also discussed using NLTK for POS tagging, which is another common NLP task, used for issues such as word sense disambiguation and answering questions. Applications such as sentiment classification are widely used for their research and business value. We covered some basic examples of text classification, in the context of sentiment analysis, for tweets and movie reviews, using the NLTK corpora and sklearn. While these can be used in simple NLP applications, more complex text classification, using deep learning, will be explained in subsequent chapters.

主站蜘蛛池模板: 海淀区| 谢通门县| 云龙县| 禄丰县| 奈曼旗| 中牟县| 凤山市| 民权县| 瓮安县| 巴彦淖尔市| 津南区| 郧西县| 榆中县| 涿鹿县| 普安县| 兰西县| 宁津县| 会泽县| 通道| 玉田县| 镶黄旗| 华安县| 义乌市| 睢宁县| 锡林郭勒盟| 涡阳县| 镇坪县| 横山县| 盐边县| 垦利县| 罗江县| 会同县| 涿鹿县| 阿合奇县| 深水埗区| 英德市| 景德镇市| 佳木斯市| 西吉县| 凤庆县| 上林县|