官术网_书友最值得收藏!

Defining custom analyzers

It's necessary to create a custom analyzer when the built-in analyzers do not provide the needed behaviors for your search application. To continue with our CourtesyTitleFilter example, we will create CourtesyTitleAnalyzer.

The anatomy of an analyzer includes one tokenizer and one or more TokenFilters. We will build an Analyzer by extending from the Analyzer abstract class and implement the createComponents method.

How to do it…

Here is the sample code for CourtesyTitleAnalyzer:

public class CourtesyTitleAnalyzer extends Analyzer {

    @Override
    protected TokenStreamComponents createComponents(String fieldName, Reader reader) {
    Tokenizer letterTokenizer = new LetterTokenizer(reader);
    TokenStream filter = new CourtesyTitleFilter(letterTokenizer);
    return new TokenStreamComponents(letterTokenizer, filter);
  }
}

How it works…

An Analyzer is created by extending from the Analyzer abstract class as shown in this example. Then we override the createComponents method, adding a LetterTokenizer to split text by non-letter characters and CourtesyTitleFilter as a TokenFilter. Finally, we return a new TokenStreamComponents instance initialized by the instantiated Tokenizer and TokenFilter.

Note that the only method we need to override is createComponents. We don't need to override the constructor to build our Analyzer because components are not added during construction; they are added when the createComponents method is called. Therefore, we override the createComponents method to customize an Analyzer. Also note that we cannot override the tokenStream method because it's declared as final.

主站蜘蛛池模板: 安龙县| 凤翔县| 文成县| 宿迁市| 玉溪市| 博爱县| 叙永县| 玉龙| 左权县| 西乡县| 福泉市| 普定县| 泰兴市| 昂仁县| 宁都县| 辽宁省| 营山县| 长葛市| 乌苏市| 淳安县| 江西省| 九龙县| 塔城市| 苏尼特右旗| 温泉县| 婺源县| 如皋市| 南江县| 桂林市| 遵义县| 堆龙德庆县| 罗城| 眉山市| 罗田县| 页游| 盘锦市| 京山县| 崇义县| 襄垣县| 新竹县| 新源县|