官术网_书友最值得收藏!

Introduction

In this chapter, we'll cover how to use corpus readers and create custom corpora. If you want to train your own model, such as a part-of-speech tagger or text classifier, you will need to create a custom corpus to train on. Model training is covered in the subsequent chapters.

Now you'll learn how to use the existing corpus data that comes with NLTK. This information is essential for future chapters when we'll need to access the corpora as training data. You've already accessed the WordNet corpus in Chapter 1, Tokenizing Text and WordNet Basics. This chapter will introduce you to many more corpora.

We'll also cover creating custom corpus readers, which can be used when your corpus is not in a file format that NLTK already recognizes, or if your corpus is not located in files at all, but instead is located in a database such as MongoDB. It is essential to be familiar with tokenization, which was covered in Chapter 1, Tokenizing Text and WordNet Basics.

主站蜘蛛池模板: 保靖县| 大埔县| 正阳县| 苗栗县| 湄潭县| 巴林左旗| 郧西县| 天门市| 福泉市| 当涂县| 运城市| 岳普湖县| 苏尼特左旗| 开原市| 湖口县| 特克斯县| 伊宁市| 张家口市| 新乡市| 通化县| 兴和县| 兰州市| 莲花县| 海原县| 佛冈县| 二手房| 桐柏县| 图木舒克市| 英德市| 繁峙县| 宁远县| 星座| 漠河县| 淮北市| 永兴县| 松潘县| 咸阳市| 兴国县| 永寿县| 循化| 凯里市|