- Python 3 Text Processing with NLTK 3 Cookbook
- Jacob Perkins
- 219字
- 2021-09-03 09:45:34
Introduction
Natural Language ToolKit (NLTK) is a comprehensive Python library for natural language processing and text analytics. Originally designed for teaching, it has been adopted in the industry for research and development due to its usefulness and breadth of coverage. NLTK is often used for rapid prototyping of text processing programs and can even be used in production applications. Demos of select NLTK functionality and production-ready APIs are available at http://text-processing.com.
This chapter will cover the basics of tokenizing text and using WordNet. Tokenization is a method of breaking up a piece of text into many pieces, such as sentences and words, and is an essential first step for recipes in the later chapters. WordNet is a dictionary designed for programmatic access by natural language processing systems. It has many different use cases, including:
- Looking up the definition of a word
- Finding synonyms and antonyms
- Exploring word relations and similarity
- Word sense disambiguation for words that have multiple uses and definitions
NLTK includes a WordNet corpus reader, which we will use to access and explore WordNet. A corpus is just a body of text, and corpus readers are designed to make accessing a corpus much easier than direct file access. We'll be using WordNet again in the later chapters, so it's important to familiarize yourself with the basics first.
- Apache ZooKeeper Essentials
- Visual Basic編程:從基礎到實踐(第2版)
- PaaS程序設計
- Data Analysis with IBM SPSS Statistics
- Visual C++數字圖像處理技術詳解
- Highcharts Cookbook
- Spring Boot企業級項目開發實戰
- 數據結構案例教程(C/C++版)
- 青少年信息學競賽
- 微服務從小白到專家:Spring Cloud和Kubernetes實戰
- HoloLens與混合現實開發
- 大話Java:程序設計從入門到精通
- Visual Studio Code 權威指南
- Learning JavaScript Data Structures and Algorithms(Second Edition)
- Visual Basic語言程序設計基礎(第3版)