- Lucene 4 Cookbook
- Edwood Ng Vineeth Mohan
- 262字
- 2021-07-16 14:07:51
Defining custom TokenFilters
Sometimes, search behaviors may be so specific that we need to create a custom TokenFilter to achieve those behaviors. To create a custom filter, we will extend from the TokenFilter class and override the incrementToken()
method.
We will create a simple word-expanding TokenFilter that expands courtesy titles from the short form to the full word. For example, Dr expands to doctor.
How to do it…
Here is the sample code:
public class CourtesyTitleFilter extends TokenFilter { Map<String,String> courtesyTitleMap = new HashMap<String,String>(); private CharTermAttribute termAttr; public CourtesyTitleFilter(TokenStream input) { super(input); termAttr = addAttribute(CharTermAttribute.class); courtesyTitleMap.put("Dr", "doctor"); courtesyTitleMap.put("Mr", "mister"); courtesyTitleMap.put("Mrs", "miss"); } public boolean incrementToken() throws IOException { if (!input.incrementToken()) return false; String small = termAttr.toString(); if(courtesyTitleMap.containsKey(small)) { termAttr.setEmpty().append(courtesyTitleMap.get(small)); } return true; } }
How it works…
We create the CourtesyTitleFilter
class by extending TokenFilter. In its constructor, we initialize a CharTermAttribute
instance for reading the token value and initialize courtesyTitleMap
with the short form and word mapping for our conversion. In the overridden method, incrementToken()
, we first check if the input (inputting TokenStream) still has a token. If no token is found, it exits with a false value. Then it checks if the token exists in courtesyTitleMap
. If a mapping is found, it resets the token value with CharTermAttribute
, setting the attribute empty
by calling setEmpty()
and appending it with the new value from courtesyTitleMap
.
When you run this code as part of an analysis process that splits text by whitespaces and applies a lowercase filter at the end, the string Dr Watson
would become [doctor] [watson]
in output.
- Vue.js 3.x快速入門
- Bootstrap Site Blueprints Volume II
- Microsoft Application Virtualization Cookbook
- Learning RxJava
- Unreal Engine 4 Shaders and Effects Cookbook
- Java系統化項目開發教程
- INSTANT Silverlight 5 Animation
- CRYENGINE Game Development Blueprints
- Arduino可穿戴設備開發
- JavaScript程序設計:基礎·PHP·XML
- 從程序員角度學習數據庫技術(藍橋杯軟件大賽培訓教材-Java方向)
- UX Design for Mobile
- Java RESTful Web Service實戰
- Web前端測試與集成:Jasmine/Selenium/Protractor/Jenkins的最佳實踐
- Python計算機視覺與深度學習實戰