官术网_书友最值得收藏!

Extending inheritance – table partitioning

Table partitioning is an approach specific to PostgreSQL that extends inheritance to model tables that typically do not vary from each other in the available fields, but where the child tables represent logical partitioning of the data based on a variety of factors, be it time, value ranges, classifications, or, in our case, spatial relationships. The advantages of partitioning include improved query performance due to smaller indexes and targeted scans of data, bulk loads, and deletes that bypass the costs of maintenance functions like VACUUM. It can thus be used to put commonly used data on a faster and more expensive storage, and the remaining data on a slower and cheaper storage. In combination with PostGIS, we get the novel power of spatial partitioning, which is a really powerful feature for large datasets.

Getting ready

We could use many examples of large datasets that could benefit from partitioning. In our case, we will use a contour dataset. Contours are useful ways to represent terrain data, as they are well established in use, and thus commonly interpreted. Contours can also be used to compress terrain data into linear representations, thus allowing them to be shown in conjunction with other data easily.

The problem is, the storage of contour data can be quite expensive. Two-foot contours for a single US county can take 20 to 40 GB, and storing such data for a larger area such as a region or nation can become quite prohibitive from the standpoint of accessing the appropriate portion of the dataset in an efficient way.

How to do it...

The first step in this case may be to prepare the data. If we had a monolithic contour table called cuy_contours_2, we could choose to clip the data to a series of rectangles that would serve as our table partitions; in this case, chp02.contour_clip, using the following query:

CREATE TABLE chp02.contour_2_cm_only AS
 SELECT contour.elevation, contour.gid, contour.div_10, contour.div_20, contour.div_50,
 contour.div_100, cc.id, ST_Intersection(contour.the_geom, cc.the_geom) AS the_geom FROM
 chp02.cuy_contours_2 AS contour, chp02.contour_clip as cc
 WHERE ST_Within(contour.the_geom,cc.the_geom
 OR
 ST_Crosses(contour.the_geom,cc.the_geom);

We are performing two tests here in our query. We are using ST_Within, which tests whether a given contour is entirely within our area of interest. If so, we perform an intersection; the resultant geometry should just be the geometry of the contour.

The ST_Crosses function checks whether the contour crosses the boundary of the geometry we are testing. This should capture all the geometries lying partially inside and partially outside our areas. These are the ones that we will truly intersect to get the resultant shape.

In our case, it is easier and we don't require the previous step. Our contour shapes are already individual shapefiles clipped to rectangular boundaries, as shown in the following screenshot:

How to do it...

Since the data is already clipped into the chunks needed for our partitions, we can just continue to create the appropriate partitions.

Much as with inheritance, we start by creating our parent table using the following query:

CREATE TABLE chp02.contours
(
 gid serial NOT NULL,
 elevation integer,
 __gid double precision,
 the_geom geometry(MultiLineStringZM,3734),
 CONSTRAINT contours_pkey PRIMARY KEY (gid)
)
WITH (
 OIDS=FALSE
);

Here again, we maintain our constraints such as PRIMARY KEY and specify the geometry type (MultiLineStringZM), not because these will propagate to the child tables, but for any client software accessing the parent table to anticipate such constraints.

Now we may begin to create tables that inherit from our parent table. In the process, we will create a CHECK constraint specifying the limits of our associated geometry using the following query:

CREATE TABLE chp02.contour_N2260630
 (CHECK (ST_CoveredBy(the_geom,ST_GeomFromText('POLYGON((2260000 630000, 2260000 635000,
 2265000 635000, 2265000 630000, 2260000 630000))',3734)
 ))) INHERITS (chp02.contours);

We can complete with similar CREATE TABLE queries for our remaining tables, as follows:

CREATE TABLE chp02.contour_N2260635
 (CHECK ( ST_CoveredBy(the_geom,ST_GeomFromText('POLYGON((2260000 635000, 2260000 640000,
 2265000 640000, 2265000 635000, 2260000 635000))', 3734)
 ))) INHERITS (chp02.contours);
CREATE TABLE chp02.contour_N2260640
 (CHECK ( ST_CoveredBy(the_geom,ST_GeomFromText('POLYGON((2260000 640000, 2260000 645000,
 2265000 645000, 2265000 640000, 2260000 640000))', 3734)
 ))) INHERITS (chp02.contours);
CREATE TABLE chp02.contour_N2265630
 (CHECK ( ST_CoveredBy(the_geom,ST_GeomFromText('POLYGON((2265000 630000, 2265000 635000,
 2270000 635000, 2270000 630000, 2265000 630000))', 3734)
 ))) INHERITS (chp02.contours);
CREATE TABLE chp02.contour_N2265635
 (CHECK ( ST_CoveredBy(the_geom,ST_GeomFromText('POLYGON((2265000 635000, 2265000 640000,
 2270000 640000, 2270000 635000, 2265000 635000))', 3734)
 ))) INHERITS (chp02.contours);
CREATE TABLE chp02.contour_N2265640
 (CHECK ( ST_CoveredBy(the_geom,ST_GeomFromText('POLYGON((2265000 640000, 2265000 645000,
 2270000 645000, 2270000 640000, 2265000 640000))', 3734)
 ))) INHERITS (chp02.contours);
CREATE TABLE chp02.contour_N2270630
 (CHECK ( ST_CoveredBy(the_geom,ST_GeomFromText('POLYGON((2270000 630000, 2270000 635000,
 2275000 635000, 2275000 630000, 2270000 630000))', 3734)
 ))) INHERITS (chp02.contours);
CREATE TABLE chp02.contour_N2270635
 (CHECK ( ST_CoveredBy(the_geom,ST_GeomFromText('POLYGON((2270000 635000, 2270000 640000,
 2275000 640000, 2275000 635000, 2270000 635000))', 3734)
 ))) INHERITS (chp02.contours);
CREATE TABLE chp02.contour_N2270640
 (CHECK ( ST_CoveredBy(the_geom,ST_GeomFromText('POLYGON((2270000 640000, 2270000 645000,
 2275000 645000, 2275000 640000, 2270000 640000))', 3734)
 ))) INHERITS (chp02.contours);

Next we can load our contours into each of our child tables using the following command. If we wanted to, we could even implement a trigger on the parent table, which would place each insert into its correct child table, though this might incur performance costs. In loading our contours, we use the -a flag to specify that we want to append the data.

shp2pgsql -s 3734 -a -i -I -W LATIN1 -g the_geom N2260630 chp02.contour_N2260630 | psql -U me -d postgis_cookbook

How it works...

The CHECK constraint in combination with inheritance is all it takes to build table partitioning. In this case, we're using a bounding box as our CHECK constraint and simply inheriting the columns from the parent table. Now that we have this in place, queries against the parent table will check our CHECK constraints first before employing a query. This also allows us to place any of our lesser-used contour tables on cheaper and slower storage, thus allowing for cost-effective optimizations of large datasets. This structure is also beneficial for rapidly changing data as updates can be applied to an entire area; the entire table for that area can be efficiently dropped and repopulated without traversing across the dataset.

Unfortunately, some of the promises of table partitioning, such as being able to bypass spatial indexes by using ranges, are not yet available.

See also

For more on table inheritance in general, particularly the flexibility associated with the usage of alternate columns in the child table, see the previous recipe, Structuring spatial data with table inheritance.

主站蜘蛛池模板: 泸西县| 玉田县| 丰台区| 夏邑县| 通山县| 宜都市| 都匀市| 汨罗市| 淅川县| 安泽县| 天津市| 宾阳县| 乐东| 吉木萨尔县| 余干县| 泾源县| 通城县| 水城县| 福州市| 大冶市| 房山区| 定远县| 凯里市| 新乡县| 屯门区| 乌拉特前旗| 夏津县| 上林县| 宾川县| 大同市| 顺昌县| 开江县| 武隆县| 雷山县| 十堰市| 扎赉特旗| 嘉禾县| 亳州市| 友谊县| 河南省| 宁波市|