官术网_书友最值得收藏!

Introduction

An important part of building NLP systems is to work with the appropriate unit for processing. This chapter addresses the abstraction layer associated with the word level of processing. This is called tokenization, which amounts to grouping adjacent characters into meaningful chunks in support of classification, entity finding, and the rest of NLP.

LingPipe provides a broad range of tokenizer needs, which are not covered in this book. Look at the Javadoc for tokenizers that do stemming, Soundex (tokens based on what English words sound like), and more.

主站蜘蛛池模板: 庆元县| 镇江市| 綦江县| 惠州市| 巴青县| 谢通门县| 米易县| 策勒县| 大足县| 稷山县| 墨竹工卡县| 东光县| 苗栗县| 永平县| 洛宁县| 平阳县| 阿拉善盟| 陵水| 夏邑县| 松江区| 叙永县| 融水| 荥阳市| 锦州市| 南京市| 沙坪坝区| 葫芦岛市| 宁德市| 娄底市| 肃南| 亳州市| 扶沟县| 开平市| 修武县| 南宁市| 安宁市| 旬阳县| 白银市| 堆龙德庆县| 黄骅市| 和龙市|