官术网_书友最值得收藏!

Summary

In this chapter, we covered common NLP tasks, such as preprocessing and exploratory analysis of text using the NLTK library. The unstructured characteristics of real-world data need extensive preprocessing, such as tokenization, stemming, and stop word removal, to make it suitable for ML. As you saw in the examples, NLTK provides a very extensive API for carrying out these preprocessing steps. It provides built-in packages and modules, and supports flexibility to build custom modules, such as user-defined stemmers and tokenizers.

We also discussed using NLTK for POS tagging, which is another common NLP task, used for issues such as word sense disambiguation and answering questions. Applications such as sentiment classification are widely used for their research and business value. We covered some basic examples of text classification, in the context of sentiment analysis, for tweets and movie reviews, using the NLTK corpora and sklearn. While these can be used in simple NLP applications, more complex text classification, using deep learning, will be explained in subsequent chapters.

主站蜘蛛池模板: 宁远县| 屏南县| 合山市| 岗巴县| 武威市| 肇庆市| 靖远县| 新宁县| 海淀区| 崇左市| 延安市| 宿迁市| 华阴市| 和林格尔县| 大关县| 高碑店市| 乐清市| 汶上县| 邻水| 华池县| 新营市| 榆树市| 文成县| 沧州市| 云梦县| 兖州市| 郓城县| 上栗县| 孙吴县| 和田市| 吴堡县| 泸水县| 三原县| 德兴市| 牙克石市| 凤冈县| 闸北区| 中西区| 福泉市| 吴忠市| 云龙县|