官术网_书友最值得收藏!

Indexing multiple geographical points

Let's assume we have a website allowing you to search for companies not only using key words but also using a geographical location. In the real world, companies tend to have more than a single location. This is where we hit a limitation in the default spatial field used by Solr; we can only have a single location indexed using it. So, we have to create multiple documents for each company location and use group collapsing, or we can use a different field type that allows multivalued location fields. The recipe will show you how to do the latter.

How to do it...

The following steps will take you through the process of enabling the indexation of multivalued spatial fields.

  1. First, we need to prepare our index structure by adding the following section to the schema.xml file:
    <field name="id" type="string" indexed="true" stored="true" required="true" />
    <field name="name" type="text_general" indexed="true" stored="true" />
    <field name="loc" type="location_recursive" indexed="true" stored="true" multiValued="true" />
  2. We also need the location_recursive field type defined, so we add the following type to the same schema.xml file:
    <fieldType name="location_recursive" class="solr.SpatialRecursivePrefixTreeFieldType" distErrPct="0.025" maxDistErr="0.000009" units="degrees" />
  3. Now, we can index our data, which looks as follows:
    <add>
     <doc>
      <field name="id">1</field>
      <field name="name">Burger Deluxe</field>
      <field name="loc">51.30,-0.12</field>
      <field name="loc">38.89,-77.03</field>
     </doc>
     <doc>
      <field name="id">2</field>
      <field name="name">Chips and fish D.C. exclusive</field>
      <field name="loc">38.89,-77.03</field>
     </doc>
    </add>
  4. So, if we want to get all companies that are located within 50 kilometers from the centre of London, we will send the following query:
    http://localhost:8983/solr/cookbook/select?q=*:*&fq={!geofilt}&sfield=loc&pt=51.30,-0.12&d=50

    The results returned by Solr will look as follows:

    <?xml version="1.0" encoding="UTF-8"?>
    <response>
     <lst name="responseHeader">
      <int name="status">0</int>
      <int name="QTime">1</int>
      <lst name="params">
       <str name="q">*:*</str>
       <str name="pt">51.30,-0.12</str>
       <str name="d">50</str>
       <str name="fq">{!geofilt}</str>
       <str name="sfield">loc</str>
      </lst>
     </lst>
     <result name="response" numFound="1" start="0">
      <doc>
       <str name="id">1</str>
       <str name="name">Burger Deluxe</str>
       <arr name="loc">
        <str>51.30,-0.12</str>
        <str>38.89,-77.03</str>
       </arr>
       <long name="_version_">1468077157967200256</long></doc>
     </result>
    </response>

As we can see, everything works as it should, so let's learn how it was done.

How it works...

Each company is described with three fields: the company identifier (the id field), the company name (the name field), and multivalued company locations (the loc field).

To be able to index multiple locations, we use a new field type that we defined, location_recursive. It uses the solr.SpatialRecursivePrefixTreeFieldType class, which is new in Solr 4. It offers more features compared to the solr.LatLonType field type and is faster when it comes to filtering of spatial data. We configured it using three properties:

  • distErrPct: This defines the default precision for the fields that store points. The value of the property can vary from 0.0 to 0.5. The closer the value is to 0, the more precise the field will be, but the indexing will be slower, and the index will be larger. If we set the value of the property closer to 0.5, the queries against the field will be faster, but at the cost of less precision.
  • maxDistErr: This defines the highest level of details required to index data. The default empty value means the detail level is of one meter, about 0.000009 degrees, which is exactly the value we used. The setting is required for the solr.SpatialRecursivePrefixTreeFieldType field type to internally calculate a spatial grid.
  • units: This is the unit used by the type; right now, the only value possible is degrees.

As you can see, the first company in our example data has two locations. The first location is the centre of London, and the second location is the centre of Washington, D.C. The second document has a single location, only in Washington, D.C.

Our query asks for all documents (q=*:*) and uses the geofilt filter (fq={!geofilt}). The geofilt filter needs three additional parameters to be passed:

  • sfield: This is the field used for spatial search, our loc field.
  • pt: This is the latitude and longitude of the point from which the distance will be calculated. In our case, it is the centre of London city.
  • d: This is the distance from the given point. In our case, it is 50, which means 50 kilometers.

As you can see, only a single document is returned by the query; the first query has the location as London, which means that everything works as it should.

See also

  • In addition to indexing multiple geographical points, solr.SpatialRecursivePrefixTreeFieldType is also capable of indexing shapes (although Solr needs additional libraries in such cases). If you are interested in such functionalities, refer to the official Solr documentation and the page dedicated to spatial search, which is available at https://cwiki.apache.org/confluence/display/solr/Spatial+Search.
主站蜘蛛池模板: 新丰县| 沐川县| 都匀市| 拉孜县| 北宁市| 宜州市| 高清| 扎兰屯市| 蒙阴县| 毕节市| 阳曲县| 霍邱县| 潜山县| 遵义县| 留坝县| 民丰县| 亚东县| 观塘区| 皮山县| 琼结县| 工布江达县| 司法| 贵定县| 修武县| 临安市| 秦皇岛市| 金秀| 大悟县| 邵武市| 海原县| 寻乌县| 勃利县| 偏关县| 镇坪县| 大庆市| 蒙山县| 井冈山市| 资溪县| 普兰县| 济阳县| 泸州市|