在线澳门游戏网站

書名： Natural Language Processing with Java and LingPipe Cookbook
作者名： Breck Baldwin Krishna Dayanidhi
本章字數： 89字
更新時間： 2021-08-05 17:12:51

Introduction

An important part of building NLP systems is to work with the appropriate unit for processing. This chapter addresses the abstraction layer associated with the word level of processing. This is called tokenization, which amounts to grouping adjacent characters into meaningful chunks in support of classification, entity finding, and the rest of NLP.

LingPipe provides a broad range of tokenizer needs, which are not covered in this book. Look at the Javadoc for tokenizers that do stemming, Soundex (tokens based on what English words sound like), and more.

官术网_书友最值得收藏!

Natural Language Processing with Java and LingPipe Cookbook

Introduction