官术网_书友最值得收藏!

  • Mastering MongoDB 3.x
  • Alex Giamas
  • 343字
  • 2021-08-20 10:10:51

Modeling data for keyword searches

Searching for keywords in a document is a common operation for many applications. If this is a core operation, it makes sense to use a specialized store for search, such as Elasticsearch; however MongoDB can be used efficiently until scale dictates moving to a different solution.

The basic need for keyword search is to be able to search the entire document for keywords. For example, with a document in the products collection:

{ name : "Macbook Pro late 2016 15in" ,
manufacturer : "Apple" ,
price: 2000 ,
keywords : [ "Macbook Pro late 2016 15in", "2000", "Apple", "macbook", "laptop", "computer" ]
}

We can create a multi-key index in the keywords field:

> db.products.createIndex( { keywords: 1 } )

Now we can search in the keywords field for any name, manufacturer, price fields, and also any of the custom keywords we set up. This is not an efficient or flexible approach as we need to keep keywords lists in sync, can't use stemming, and can't rank results (it's more like filtering than searching) with the only upside being implementation time.

Since version 2.4 , MongoDB has had a special text index type. This can be declared in one or multiple fields and supports stemming, tokenization, exact phrase (" "), negation (-), and weighting results.

Index declaration on three fields with custom weights:

db.products.createIndex({
name: "text",
manufacturer: "text",
price: "text"
},
{
weights: { name: 10,
manufacturer: 5,
price: 1 },
name: "ProductIndex"
})

In this example, name is 10 times more important that price but only two from a manufacturer.

A text index can also be declared with a wildcard, matching all fields that match the pattern:

db.collection.createIndex( { "$**": "text" } )

This can be useful when we have unstructured data and we may not know all the fields that they will come with. We can drop the index by name just like with any other index.

The greatest advantage though, other than all the features, is that all record keeping is done by the database.

主站蜘蛛池模板: 从化市| 安溪县| 太白县| 大理市| 当雄县| 浦北县| 新津县| 上杭县| 藁城市| 偃师市| 呼伦贝尔市| 苏尼特左旗| 德令哈市| 北碚区| 凤阳县| 吉木乃县| 沁阳市| 万全县| 阳东县| 额济纳旗| 墨竹工卡县| 安庆市| 泾源县| 兴国县| 周宁县| 革吉县| 都安| 湖州市| 盐边县| 定西市| 绍兴市| 黄山市| 文昌市| 广西| 双城市| 湄潭县| 崇信县| 永寿县| 贡觉县| 乐平市| 平乡县|