- Solr Cookbook(Third Edition)
- Rafa? Ku?
- 589字
- 2021-08-06 19:39:23
Counting the number of fields
Imagine a situation where we have a simple document to be indexed to Solr with titles and tags. What we will want to do is separate the premium documents that have more tag values because they are better in terms of our business. Of course, we can count the number of tags ourselves, but why not let Solr do this? This recipe will show you how to do this with Solr.
How to do it...
Let's look at the steps we need to take to count the number of field values.
- We start with the index structure. What we need to do is put the following section in the
schema.xml
file:<field name="id" type="string" indexed="true" stored="true" required="true" /> <field name="title" type="text_general" indexed="true" stored="true"/> <field name="tags" type="string" indexed="true" stored="true" multiValued="true"/> <field name="tags_count" type="int" indexed="true" stored="true"/>
- The next thing is our test data, which looks as follows:
<add> <doc> <field name="id">1</field> <field name="title">Solr Cookbook 4</field> <field name="tags">solr</field> </doc> <doc> <field name="id">2</field> <field name="title">Solr Cookbook 4 second edition</field> <field name="tags">search</field> <field name="tags">solr</field> <field name="tags">cookbook</field> </doc> </add>
- In addition to this, we need to alter our
solrconfig.xml
file. First, we add the proper update request processor to the file:<updateRequestProcessorChain name="count"> <processor class="solr.CloneFieldUpdateProcessorFactory"> <str name="source">tags</str> <str name="dest">tags_count</str> </processor> <processor class="solr.CountFieldValuesUpdateProcessorFactory"> <str name="fieldName">tags_count</str> </processor> <processor class="solr.DefaultValueUpdateProcessorFactory"> <str name="fieldName">tags_count</str> <int name="value">0</int> </processor> <processor class="solr.LogUpdateProcessorFactory" /> <processor class="solr.RunUpdateProcessorFactory" /> </updateRequestProcessorChain>
- We would also like to have our update processor be used with every indexing request, so we change our
/update
handler in thesolrconfig.xml
file so that it looks like this:<requestHandler name="/update" class="solr.UpdateRequestHandler"> <lst name="defaults"> <str name="update.chain">count</str> </lst> </requestHandler>
- Now, if we want to use the count information Solr automatically added, we will send the following query:
http://localhost:8983/solr/cookbook/select?q=title:cookbook&bf=field(tags_count)&defType=edismax
- Solr will position the document with more tags at the top of the result list:
<?xml version="1.0" encoding="UTF-8"?> <response> <lst name="responseHeader"> <int name="status">0</int> <int name="QTime">1</int> <lst name="params"> <str name="q">title:cookbook</str> <str name="defType">edismax</str> <str name="bf">field(tags_count)</str> </lst> </lst> <result name="response" numFound="2" start="0"> <doc> <str name="id">2</str> <str name="title">Solr Cookbook 4 second edition</str> <arr name="tags"> <str>search</str> <str>solr</str> <str>cookbook</str> </arr> <int name="tags_count">3</int> <long name="_version_">1467535763434373120</long></doc> <doc> <str name="id">1</str> <str name="title">Solr Cookbook 4</str> <arr name="tags"> <str>solr</str> </arr> <int name="tags_count">1</int> <long name="_version_">1467535763382992896</long></doc> </result> </response>
Now, let's see how it works.
How it works...
The index structure is quite simple. It contains a unique identifier field, a title, a field holding tags, and a field holding the count of tags. As you can see, in the example data, we provide the identifier of the document, its title, and the tags. What we don't provide is the number of tags that we calculate during indexation.
We also defined a new update request processor chain called count
. It contains five update processors.
The first update processor, solr.CloneFieldUpdateProcessorFactory
, is responsible for copying the value of the field defined by the source
property to a field defined by the dest
property. The second update processor, solr.CountFieldValuesUpdateProcessorFactory
, replaces the actual value of the field defined by the fieldName
property with the count of values. This is why we need the solr.CloneFieldUpdateProcessorFactory
update processor before solr.CountFieldValuesUpdateProcessorFactory
. The third update processor, solr.DefaultValueUpdateProcessorFactory
, sets the default value (defined by the value
property) for the field defined by the fieldName
property. The other request processors are responsible for logging the request information and running the update. By defining this chain, we tell Solr that we want the tags
field to be cloned into tags_count
first, then we want the counts to be calculated and placed in the tags_count
field; if we don't have a value in the tags_count
field, we set it to 0
.
We also define the solr.UpdateRequestHandler
configuration and then alter the default configuration by adding the defaults
section and including the update.chain
property to count
(our update request processor chain name). This means that our defined update request processor chain will be used with every indexing request.
Our query searches for every document that includes the cookbook
term in the title
field. We will also use the edismax
query parser (defType=edismax
). We also include a simple boosting function that boosts documents by the value of their tags_count
field (bf=field(tags_count)
). As you can see in the results, we get what we wanted to achieve.
- ASP.NET Core:Cloud-ready,Enterprise Web Application Development
- Vue.js前端開發基礎與項目實戰
- Instant Zepto.js
- 動手玩轉Scratch3.0編程:人工智能科創教育指南
- Android NDK Beginner’s Guide
- Python Data Analysis(Second Edition)
- Python機器學習編程與實戰
- Oracle從入門到精通(第5版)
- Mastering Linux Network Administration
- Python忍者秘籍
- INSTANT Sinatra Starter
- HTML5 APP開發從入門到精通(微課精編版)
- Babylon.js Essentials
- 精通MySQL 8(視頻教學版)
- 零基礎學C語言程序設計