官术网_书友最值得收藏!

Applying grouping sets

The GROUP BY clause will turn many rows into one row per group. However, if you do reporting in real life, you might also be interested in the overall average. One additional line might be needed.

Here is how this can be achieved:

test=# SELECT region, avg(production) 
FROM t_oil
GROUP BY ROLLUP (region);
region | avg ----------------+----------------------- Middle East | 1992.6036866359447005
North America | 4541.3623188405797101
| 2607.5139860139860140 (3 rows)

ROLLUP will inject an additional line, which will contain the overall average. If you do reporting, it is highly likely that a summary line will be needed. Instead of running two queries, PostgreSQL can provide the data by running just a single query. There is also a second thing you might notice here: different versions of PostgreSQL might return data in a different order. The reason for that is that in PostgreSQL 10.0 the way those grouping sets are implemented has improved significantly. Back in 9.6 and before, PostgreSQL had to do a lot of sorting. Starting with 10.0, it is already possible to use hashing for those operations, which will speed things up, dramatically in many cases:

test=# explain SELECT region, avg(production) 
FROM t_oil
GROUP BY ROLLUP (region);

QUERY PLAN ---------------------------------------------------------- MixedAggregate (cost=0.00..17.31 rows=3 width=44) Hash Key: region Group Key: () -> Seq Scan on t_oil (cost=0.00..12.44 rows=644 width=16) (4 rows)

In case you want data to be sorted, and if you want to ensure that all versions return the data in exactly the same order, it is necessary to add an ORDER BY clause to the query.

Of course, this kind of operation can also be used if you are grouping by more than just one column:

test=# SELECT region, country, avg(production) 
FROM t_oil
WHERE country IN ('USA', 'Canada', 'Iran', 'Oman')
GROUP BY ROLLUP (region, country);
region | country | avg ---------------+---------+----------------------- Middle East | Iran | 3631.6956521739130435
Middle East | Oman | 586.4545454545454545
Middle East | | 2142.9111111111111111
North America | Canada | 2123.2173913043478261
North America | USA | 9141.3478260869565217
North America | | 5632.2826086956521739
| | 3906.7692307692307692 (7 rows)

In this example, PostgreSQL will inject three lines into the result set. One line will be injected for the Middle East and one for North America. On top of that, we will get a line for the overall averages. If you are building a web application, the current result is ideal because you can easily build a GUI to drill into the result set by filtering out the null values.

ROLLUP is suitable when you instantly want to display a result. I have always used it to display final results to end users. However, if you are doing reporting, you might want to pre-calculate more data to ensure more flexibility. The CUBE keyword is what you might have been looking for:

test=# SELECT region, country, avg(production) 
FROM t_oil WHERE country IN ('USA', 'Canada', 'Iran', 'Oman')
GROUP BY CUBE (region, country);

region | country | avg
---------------+---------+-----------------------
Middle East | Iran | 3631.6956521739130435
Middle East | Oman | 586.4545454545454545
Middle East | | 2142.9111111111111111
North America | Canada | 2123.2173913043478261
North America | USA | 9141.3478260869565217
North America | | 5632.2826086956521739
| | 3906.7692307692307692
| Canada | 2123.2173913043478261
| Iran | 3631.6956521739130435
| Oman | 586.4545454545454545
| USA | 9141.3478260869565217 (11 rows)

Note that even more rows have been added to the result. CUBE will create the same data as: GROUP BY region, country + GROUP BY region + GROUP BY country + the overall average. So, the whole idea is to extract many results and various levels of aggregation at once. The resulting cube contains all possible combinations of groups.

ROLLUP and CUBE are really just convenience features on top of the GROUPING SETS clause. With the GROUPING SETS clause, you can explicitly list the aggregates you want:

test=# SELECT region, country, avg(production) 
FROM t_oil WHERE country IN ('USA', 'Canada', 'Iran', 'Oman')
GROUP BY GROUPING SETS ( (), region, country);
region | country | avg ----------------+---------+----------------------- Middle East | | 2142.9111111111111111 North America | | 5632.2826086956521739 | | 3906.7692307692307692 | Canada | 2123.2173913043478261 | Iran | 3631.6956521739130435 | Oman | 586.4545454545454545 | USA | 9141.3478260869565217 (7 rows)

In this, I went for three grouping sets: the overall average, GROUP BY region, and GROUP BY country. If you want regions and countries combined, use (region, country).

主站蜘蛛池模板: 贞丰县| 大邑县| 通辽市| 阿瓦提县| 贵港市| 乌拉特后旗| 宁化县| 曲阜市| 洛扎县| 阿克陶县| 澳门| 舒兰市| 囊谦县| 义马市| 湄潭县| 紫云| 贞丰县| 那坡县| 开原市| 老河口市| 乐东| 原阳县| 云和县| 磐安县| 双柏县| 昔阳县| 海丰县| 正定县| 仪征市| 六安市| 大庆市| 新巴尔虎左旗| 唐海县| 隆尧县| 琼结县| 祁东县| 郓城县| 诸暨市| 昭苏县| 青州市| 建湖县|