- Mastering PostgreSQL 10
- Hans Jürgen Sch?nig
- 367字
- 2021-06-30 19:04:02
Partitioning data
So far, the same result can also easily be achieved using a subselect. However, if you want more than just the overall average, subselects will turn your queries into nightmares. Suppose, you don't just want the overall average but the average of the country you are dealing with. A PARTITION BY clause is what you need:
test=# SELECT country, year, production, consumption,
avg(production) OVER (PARTITION BY country)
FROM t_oil; country | year | production | consumption | avg ----------+-------+------------+-------------+----------- Canada | 1965 | 920 | 1108 | 2123.2173 Canada | 2010 | 3332 | 2316 | 2123.2173 Canada | 2009 | 3202 | 2190 | 2123.2173 ... Iran | 1966 | 2132 | 148 | 3631.6956 Iran | 2010 | 4352 | 1874 | 3631.6956 Iran | 2009 | 4249 | 2012 | 3631.6956 ...
The point here is that each country will be assigned to the average of the country. The OVER clause defines the window we are looking at. In this case, the window is the country the row belongs to. In other words, the query returns the rows compared to all rows in this country.
Basically, a PARTITION BY clause takes any expression. Usually, most people will use a column to partition the data. Here is an example:
test=# SELECT year, production, avg(production) OVER (PARTITION BY year < 1990) FROM t_oil WHERE country = 'Canada' ORDER BY year;
year | production | avg -------+------------+----------------------- 1965 | 920 | 1631.6000000000000000 1966 | 1012 | 1631.6000000000000000 ... 1990 | 1967 | 2708.4761904761904762 1991 | 1983| 2708.4761904761904762 1992 | 2065| 2708.4761904761904762 ...
The point is that data is split using the expression. year < 1990 can return two values: true or false. Depending on the group a year is in, it will be assigned to the pre-1990 average or to the post-1990 average. PostgreSQL is really flexible here. Using functions to determine group membership is not uncommon in real-world applications.