官术网_书友最值得收藏!

  • Lucene 4 Cookbook
  • Edwood Ng Vineeth Mohan
  • 222字
  • 2021-07-16 14:07:51

Defining custom analyzers

It's necessary to create a custom analyzer when the built-in analyzers do not provide the needed behaviors for your search application. To continue with our CourtesyTitleFilter example, we will create CourtesyTitleAnalyzer.

The anatomy of an analyzer includes one tokenizer and one or more TokenFilters. We will build an Analyzer by extending from the Analyzer abstract class and implement the createComponents method.

How to do it…

Here is the sample code for CourtesyTitleAnalyzer:

public class CourtesyTitleAnalyzer extends Analyzer {

    @Override
    protected TokenStreamComponents createComponents(String fieldName, Reader reader) {
    Tokenizer letterTokenizer = new LetterTokenizer(reader);
    TokenStream filter = new CourtesyTitleFilter(letterTokenizer);
    return new TokenStreamComponents(letterTokenizer, filter);
  }
}

How it works…

An Analyzer is created by extending from the Analyzer abstract class as shown in this example. Then we override the createComponents method, adding a LetterTokenizer to split text by non-letter characters and CourtesyTitleFilter as a TokenFilter. Finally, we return a new TokenStreamComponents instance initialized by the instantiated Tokenizer and TokenFilter.

Note that the only method we need to override is createComponents. We don't need to override the constructor to build our Analyzer because components are not added during construction; they are added when the createComponents method is called. Therefore, we override the createComponents method to customize an Analyzer. Also note that we cannot override the tokenStream method because it's declared as final.

主站蜘蛛池模板: 宜兰县| 那坡县| 濮阳县| 汨罗市| 安义县| 平顺县| 绵阳市| 巴林左旗| 凌云县| 铅山县| 津市市| 如皋市| 鲁山县| 阿克| 孟连| 闸北区| 秭归县| 祁东县| 舞钢市| 巴南区| 安顺市| 鹰潭市| 西乡县| 元谋县| 嘉善县| 辽中县| 宁武县| 江安县| 衡南县| 奈曼旗| 松桃| 邻水| 故城县| 上林县| 达拉特旗| 贵南县| 临桂县| 横峰县| 沙田区| 长岛县| 沿河|