官术网_书友最值得收藏!

Indexing multiple geographical points

Let's assume we have a website allowing you to search for companies not only using key words but also using a geographical location. In the real world, companies tend to have more than a single location. This is where we hit a limitation in the default spatial field used by Solr; we can only have a single location indexed using it. So, we have to create multiple documents for each company location and use group collapsing, or we can use a different field type that allows multivalued location fields. The recipe will show you how to do the latter.

How to do it...

The following steps will take you through the process of enabling the indexation of multivalued spatial fields.

  1. First, we need to prepare our index structure by adding the following section to the schema.xml file:
    <field name="id" type="string" indexed="true" stored="true" required="true" />
    <field name="name" type="text_general" indexed="true" stored="true" />
    <field name="loc" type="location_recursive" indexed="true" stored="true" multiValued="true" />
  2. We also need the location_recursive field type defined, so we add the following type to the same schema.xml file:
    <fieldType name="location_recursive" class="solr.SpatialRecursivePrefixTreeFieldType" distErrPct="0.025" maxDistErr="0.000009" units="degrees" />
  3. Now, we can index our data, which looks as follows:
    <add>
     <doc>
      <field name="id">1</field>
      <field name="name">Burger Deluxe</field>
      <field name="loc">51.30,-0.12</field>
      <field name="loc">38.89,-77.03</field>
     </doc>
     <doc>
      <field name="id">2</field>
      <field name="name">Chips and fish D.C. exclusive</field>
      <field name="loc">38.89,-77.03</field>
     </doc>
    </add>
  4. So, if we want to get all companies that are located within 50 kilometers from the centre of London, we will send the following query:
    http://localhost:8983/solr/cookbook/select?q=*:*&fq={!geofilt}&sfield=loc&pt=51.30,-0.12&d=50

    The results returned by Solr will look as follows:

    <?xml version="1.0" encoding="UTF-8"?>
    <response>
     <lst name="responseHeader">
      <int name="status">0</int>
      <int name="QTime">1</int>
      <lst name="params">
       <str name="q">*:*</str>
       <str name="pt">51.30,-0.12</str>
       <str name="d">50</str>
       <str name="fq">{!geofilt}</str>
       <str name="sfield">loc</str>
      </lst>
     </lst>
     <result name="response" numFound="1" start="0">
      <doc>
       <str name="id">1</str>
       <str name="name">Burger Deluxe</str>
       <arr name="loc">
        <str>51.30,-0.12</str>
        <str>38.89,-77.03</str>
       </arr>
       <long name="_version_">1468077157967200256</long></doc>
     </result>
    </response>

As we can see, everything works as it should, so let's learn how it was done.

How it works...

Each company is described with three fields: the company identifier (the id field), the company name (the name field), and multivalued company locations (the loc field).

To be able to index multiple locations, we use a new field type that we defined, location_recursive. It uses the solr.SpatialRecursivePrefixTreeFieldType class, which is new in Solr 4. It offers more features compared to the solr.LatLonType field type and is faster when it comes to filtering of spatial data. We configured it using three properties:

  • distErrPct: This defines the default precision for the fields that store points. The value of the property can vary from 0.0 to 0.5. The closer the value is to 0, the more precise the field will be, but the indexing will be slower, and the index will be larger. If we set the value of the property closer to 0.5, the queries against the field will be faster, but at the cost of less precision.
  • maxDistErr: This defines the highest level of details required to index data. The default empty value means the detail level is of one meter, about 0.000009 degrees, which is exactly the value we used. The setting is required for the solr.SpatialRecursivePrefixTreeFieldType field type to internally calculate a spatial grid.
  • units: This is the unit used by the type; right now, the only value possible is degrees.

As you can see, the first company in our example data has two locations. The first location is the centre of London, and the second location is the centre of Washington, D.C. The second document has a single location, only in Washington, D.C.

Our query asks for all documents (q=*:*) and uses the geofilt filter (fq={!geofilt}). The geofilt filter needs three additional parameters to be passed:

  • sfield: This is the field used for spatial search, our loc field.
  • pt: This is the latitude and longitude of the point from which the distance will be calculated. In our case, it is the centre of London city.
  • d: This is the distance from the given point. In our case, it is 50, which means 50 kilometers.

As you can see, only a single document is returned by the query; the first query has the location as London, which means that everything works as it should.

See also

  • In addition to indexing multiple geographical points, solr.SpatialRecursivePrefixTreeFieldType is also capable of indexing shapes (although Solr needs additional libraries in such cases). If you are interested in such functionalities, refer to the official Solr documentation and the page dedicated to spatial search, which is available at https://cwiki.apache.org/confluence/display/solr/Spatial+Search.
主站蜘蛛池模板: 辽中县| 吉木萨尔县| 浦北县| 邢台市| 巴楚县| 宁陕县| 汉阴县| 三亚市| 萨嘎县| 望奎县| 贺州市| 上林县| 蕉岭县| 汨罗市| 鹤岗市| 宜君县| 乐东| 轮台县| 子长县| 扬州市| 墨竹工卡县| 淮南市| 巫溪县| 海林市| 都匀市| 万荣县| 惠水县| 舒兰市| 吉水县| 盐津县| 吐鲁番市| 通城县| 许昌县| 江津市| 万州区| 紫金县| 靖西县| 乐业县| 济阳县| 星座| 沁源县|