官术网_书友最值得收藏!

Chapter 2. Finding and Working with Words

In this chapter, we cover the following recipes:

  • Introduction to tokenizer factories – finding words in a character stream
  • Combining tokenizers – lowercase tokenizer
  • Combining tokenizers – stop word tokenizers
  • Using Lucene/Solr tokenizers
  • Using Lucene/Solr tokenizers with LingPipe
  • Evaluating tokenizers with unit tests
  • Modifying tokenizer factories
  • Finding words for languages without white spaces
主站蜘蛛池模板: 靖江市| 建水县| 塔河县| 鹿泉市| 黄浦区| 大丰市| 汪清县| 丰原市| 嘉鱼县| 泰顺县| 南华县| 浮山县| 邵武市| 博兴县| 馆陶县| 姜堰市| 屏东市| 莱阳市| 内黄县| 宜君县| 崇阳县| 颍上县| 宁乡县| 溧水县| 曲阳县| 山西省| 保康县| 易门县| 通化县| 临猗县| 长葛市| 南木林县| 永新县| 格尔木市| 安吉县| 绩溪县| 铜川市| 泊头市| 成武县| 芜湖县| 宝清县|