官术网_书友最值得收藏!

Defining custom analyzers

It's necessary to create a custom analyzer when the built-in analyzers do not provide the needed behaviors for your search application. To continue with our CourtesyTitleFilter example, we will create CourtesyTitleAnalyzer.

The anatomy of an analyzer includes one tokenizer and one or more TokenFilters. We will build an Analyzer by extending from the Analyzer abstract class and implement the createComponents method.

How to do it…

Here is the sample code for CourtesyTitleAnalyzer:

public class CourtesyTitleAnalyzer extends Analyzer {

    @Override
    protected TokenStreamComponents createComponents(String fieldName, Reader reader) {
    Tokenizer letterTokenizer = new LetterTokenizer(reader);
    TokenStream filter = new CourtesyTitleFilter(letterTokenizer);
    return new TokenStreamComponents(letterTokenizer, filter);
  }
}

How it works…

An Analyzer is created by extending from the Analyzer abstract class as shown in this example. Then we override the createComponents method, adding a LetterTokenizer to split text by non-letter characters and CourtesyTitleFilter as a TokenFilter. Finally, we return a new TokenStreamComponents instance initialized by the instantiated Tokenizer and TokenFilter.

Note that the only method we need to override is createComponents. We don't need to override the constructor to build our Analyzer because components are not added during construction; they are added when the createComponents method is called. Therefore, we override the createComponents method to customize an Analyzer. Also note that we cannot override the tokenStream method because it's declared as final.

主站蜘蛛池模板: 襄垣县| 元朗区| 壶关县| 沂源县| 衡阳市| 桑植县| 仙桃市| 台前县| 张家港市| 庆云县| 鸡西市| 安国市| 陇川县| 永城市| 新田县| 普宁市| 岚皋县| 南城县| 岫岩| 喜德县| 新民市| 墨竹工卡县| 长武县| 麻阳| 邓州市| 沧州市| 合川市| 绥棱县| 晋城| 漳平市| 伊吾县| 科尔| 法库县| 井冈山市| 合水县| 新晃| 新巴尔虎右旗| 大竹县| 新源县| 和林格尔县| 清流县|