官术网_书友最值得收藏!

Chapter 2. Finding and Working with Words

In this chapter, we cover the following recipes:

  • Introduction to tokenizer factories – finding words in a character stream
  • Combining tokenizers – lowercase tokenizer
  • Combining tokenizers – stop word tokenizers
  • Using Lucene/Solr tokenizers
  • Using Lucene/Solr tokenizers with LingPipe
  • Evaluating tokenizers with unit tests
  • Modifying tokenizer factories
  • Finding words for languages without white spaces
主站蜘蛛池模板: 静海县| 赞皇县| 营口市| 南皮县| 汤原县| 田林县| 桑植县| 谢通门县| 周至县| 大同市| 贡觉县| 乐山市| 贞丰县| 涪陵区| 凭祥市| 鹤山市| 定南县| 定南县| 钟祥市| 镇江市| 秦安县| 龙山县| 望江县| 天津市| 大渡口区| 宁安市| 连山| 湖州市| 龙陵县| 马鞍山市| 德钦县| 铅山县| 大荔县| 五指山市| 密山市| 孟津县| 盐津县| 托里县| 永清县| 扶余县| 余姚市|