官术网_书友最值得收藏!

Searching your documents

To search we use a large set of documents and our interest here lies only in a subset of this document set. This can be based on a set of constraints and conditions. A search does not stop here. You might be interested in getting a snapshot view of your query result. In our case, if the user searches for dell, he/she might be interested in seeing different unique product types and their document count. This is called an aggregation. Through this, we enhance our search experience and make it more explorable. Here, we try to discover various querying options through which we can express our requirement and communicate the same to Elasticsearch.

In our search application, we expose a search box that can be used for a search. We abstract out information about which field is searched or what is the precedence of the fields that we search. Let's see the query types that would be best for this search box.

A match query

A match query is the ideal place to start your query. It can be used to search many field types, such as a number, string, or even date. Let's see how we can use this to provide the search input box. Let's assume that a user fired a search against the keyword laptop. It does make sense to search on the field's name and description for this keyword and there is no sense to do the same for price or date fields.

Note

Elasticsearch, by default, stores an additional search field called _all, which is an aggregated field over all the field values in that document. Hence, to do a document-level search, it's good to use _all.

A simple match query in all the fields for a word laptop is as follows:

{
  "query": {
    "match": {
      "_all": "laptop"
    }
  }
}

Wait, won't we be using the _all search on the date and price fields too? Which we don't intent to… not in this case. Remember, we search include_in_all as false for all fields other than the name and description fields. This will make sure that these field values won't flow to _all.

Sweet, we are able to search on the string fields that make sense to us and we get neat results. However, now, the requirement from the management has changed. Rather than treating the name field and description field with equal precedence, I would rather like to give weightage to the name field over description. This means that for a document match, if the word is present in the name field, make that document more relevant over a document, where the match that worked is only on the field description. Let's see how we can achieve it using a variance of a match query.

Multifield match query

A multifield match query has the provision to search on multiple fields rather than a single field. Wait, it doesn't stop here. You can also give precedence or importance to each field along with it. This helps us to tell Elasticsearch to treat certain field matches better than others:

{
    "query": {
        "multi_match": {
          "query": "laptop",
          "fields": [
          "name^2",
          "description"
          ]
        }
    }
}

Here, we ask Elasticsearch to match the word laptop on both the field name and description, but give greater relevancy to a match on the field name over a match on description field.

Let's consider the following documents:

  • Document A:
    • Name: Lenovo laptop
    • Description: This is a great product with very high rating from Lenovo
  • Document B:
    • Name: Lenovo bags
    • Description: These are great laptop bags with very high rating from Lenovo

A search on the word laptop will yield a better match on Document A rather than Document B, which makes perfectly good sense in the real-world scenario.

主站蜘蛛池模板: 海林市| 开江县| 黄冈市| 九江县| 新宁县| 新龙县| 合作市| 巴彦淖尔市| 琼结县| 密云县| 金门县| 夏津县| 星子县| 呼玛县| 洛浦县| 米林县| 阳朔县| 古丈县| 浮梁县| 梅河口市| 旅游| 奇台县| 泽普县| 原阳县| 扎兰屯市| 昌图县| 长沙县| 连城县| 哈密市| 胶南市| 乐东| 蕲春县| 杂多县| 繁昌县| 灌阳县| 嘉荫县| 云浮市| 河源市| 玛纳斯县| 自治县| 东至县|