官术网_书友最值得收藏!

Indexing multiple geographical points

Let's assume we have a website allowing you to search for companies not only using key words but also using a geographical location. In the real world, companies tend to have more than a single location. This is where we hit a limitation in the default spatial field used by Solr; we can only have a single location indexed using it. So, we have to create multiple documents for each company location and use group collapsing, or we can use a different field type that allows multivalued location fields. The recipe will show you how to do the latter.

How to do it...

The following steps will take you through the process of enabling the indexation of multivalued spatial fields.

  1. First, we need to prepare our index structure by adding the following section to the schema.xml file:
    <field name="id" type="string" indexed="true" stored="true" required="true" />
    <field name="name" type="text_general" indexed="true" stored="true" />
    <field name="loc" type="location_recursive" indexed="true" stored="true" multiValued="true" />
  2. We also need the location_recursive field type defined, so we add the following type to the same schema.xml file:
    <fieldType name="location_recursive" class="solr.SpatialRecursivePrefixTreeFieldType" distErrPct="0.025" maxDistErr="0.000009" units="degrees" />
  3. Now, we can index our data, which looks as follows:
    <add>
     <doc>
      <field name="id">1</field>
      <field name="name">Burger Deluxe</field>
      <field name="loc">51.30,-0.12</field>
      <field name="loc">38.89,-77.03</field>
     </doc>
     <doc>
      <field name="id">2</field>
      <field name="name">Chips and fish D.C. exclusive</field>
      <field name="loc">38.89,-77.03</field>
     </doc>
    </add>
  4. So, if we want to get all companies that are located within 50 kilometers from the centre of London, we will send the following query:
    http://localhost:8983/solr/cookbook/select?q=*:*&fq={!geofilt}&sfield=loc&pt=51.30,-0.12&d=50

    The results returned by Solr will look as follows:

    <?xml version="1.0" encoding="UTF-8"?>
    <response>
     <lst name="responseHeader">
      <int name="status">0</int>
      <int name="QTime">1</int>
      <lst name="params">
       <str name="q">*:*</str>
       <str name="pt">51.30,-0.12</str>
       <str name="d">50</str>
       <str name="fq">{!geofilt}</str>
       <str name="sfield">loc</str>
      </lst>
     </lst>
     <result name="response" numFound="1" start="0">
      <doc>
       <str name="id">1</str>
       <str name="name">Burger Deluxe</str>
       <arr name="loc">
        <str>51.30,-0.12</str>
        <str>38.89,-77.03</str>
       </arr>
       <long name="_version_">1468077157967200256</long></doc>
     </result>
    </response>

As we can see, everything works as it should, so let's learn how it was done.

How it works...

Each company is described with three fields: the company identifier (the id field), the company name (the name field), and multivalued company locations (the loc field).

To be able to index multiple locations, we use a new field type that we defined, location_recursive. It uses the solr.SpatialRecursivePrefixTreeFieldType class, which is new in Solr 4. It offers more features compared to the solr.LatLonType field type and is faster when it comes to filtering of spatial data. We configured it using three properties:

  • distErrPct: This defines the default precision for the fields that store points. The value of the property can vary from 0.0 to 0.5. The closer the value is to 0, the more precise the field will be, but the indexing will be slower, and the index will be larger. If we set the value of the property closer to 0.5, the queries against the field will be faster, but at the cost of less precision.
  • maxDistErr: This defines the highest level of details required to index data. The default empty value means the detail level is of one meter, about 0.000009 degrees, which is exactly the value we used. The setting is required for the solr.SpatialRecursivePrefixTreeFieldType field type to internally calculate a spatial grid.
  • units: This is the unit used by the type; right now, the only value possible is degrees.

As you can see, the first company in our example data has two locations. The first location is the centre of London, and the second location is the centre of Washington, D.C. The second document has a single location, only in Washington, D.C.

Our query asks for all documents (q=*:*) and uses the geofilt filter (fq={!geofilt}). The geofilt filter needs three additional parameters to be passed:

  • sfield: This is the field used for spatial search, our loc field.
  • pt: This is the latitude and longitude of the point from which the distance will be calculated. In our case, it is the centre of London city.
  • d: This is the distance from the given point. In our case, it is 50, which means 50 kilometers.

As you can see, only a single document is returned by the query; the first query has the location as London, which means that everything works as it should.

See also

  • In addition to indexing multiple geographical points, solr.SpatialRecursivePrefixTreeFieldType is also capable of indexing shapes (although Solr needs additional libraries in such cases). If you are interested in such functionalities, refer to the official Solr documentation and the page dedicated to spatial search, which is available at https://cwiki.apache.org/confluence/display/solr/Spatial+Search.
主站蜘蛛池模板: 景东| 炎陵县| 保亭| 平顶山市| 淮南市| 靖江市| 泰宁县| 中超| 阿瓦提县| 天祝| 阳西县| 遂平县| 溧水县| 繁峙县| 楚雄市| 荣成市| 黄骅市| 仲巴县| 抚顺县| 麻城市| 河源市| 吴桥县| 公安县| 靖边县| 新建县| 晋宁县| 高碑店市| 莱芜市| 海门市| 灵宝市| 平凉市| 安图县| 荥阳市| 岚皋县| 潼关县| 云安县| 南澳县| 城固县| 石阡县| 四会市| 射洪县|