官术网_书友最值得收藏!

  • Lucene 4 Cookbook
  • Edwood Ng Vineeth Mohan
  • 240字
  • 2021-07-16 14:07:50

Obtaining a TokenStream

TokenStream is an intermediate data format between components within the analysis process. TokenStream acts as both an input and output format in all filters. For tokenizer, it consumes text from a reader and outputs result as TokenStream. Let's explore TokenStream in detail in this section.

Getting ready

The Analyzer class is an abstract base class containing two methods of interest. The first one is createComponents (String fieldname, Reader reader). This is where the analyzer is put together by chaining the tokenizer and filters. The second method is tokenStream (String fieldname, Reader reader). This is the method we will review in this section. We will use the tokenStream method to return a processed TokenStream so we can examine its content after the analysis process.

How to do it...

We need two arguments to call the tokenStream method. The first is a field name and the second is a reader:

Reader reader = new StringReader("Text to be passed");
Analyzer analyzer = new SimpleAnalyzer();
TokenStream tokenStream = analyzer.tokenStream("myField", reader);

How it works…

An analyzer processes incoming text via a Reader input. Internally, the Reader is passed on to Tokenizer, which turns the text into a TokenStream after it's been processed. From here on, TokenStream is passed around between filters in every step. TokenStream is essentially an enumeration of tokens that you can iterate through. TokenStream extends from AttributeSource and it provides an interface to return token attributes and value.

主站蜘蛛池模板: 巴彦淖尔市| 东光县| 南丹县| 化德县| 临沧市| 西乡县| 南丹县| 邵阳市| 乌拉特后旗| 镇巴县| 锦州市| 甘肃省| 赣榆县| 辽宁省| 双江| 万宁市| 西畴县| 东兰县| 北流市| 阿鲁科尔沁旗| 谷城县| 临颍县| 惠东县| 盐池县| 类乌齐县| 望都县| 晴隆县| 修武县| 余干县| 斗六市| 富民县| 绥滨县| 双桥区| 淮安市| 桦甸市| 文昌市| 古交市| 长武县| 井冈山市| 平陆县| 凉城县|