書名： Mastering PostgreSQL 10
作者名： Hans Jürgen Sch?nig
本章字數： 852字
更新時間： 2021-06-30 19:03:52

Watching VACUUM at work

After this introduction, it is time to see VACUUM in action. I have included this section here because my practical work as a PostgreSQL consultant and supporter (http://postgresql-support.de/) indicates that most people only have a very vague understanding of what happens on the storage side.

To stress this point again, in most cases, VACUUM will not shrink your tables; space is usually not returned to the filesystem.

Here is my example:

CREATE TABLE t_test (id int) WITH (autovacuum_enabled = off); 
INSERT INTO t_test 
   SELECT * FROM generate_series(1, 100000);

The idea is to create a simple table containing 100,000 rows. Note that it is possible to turn autovacuum off for specific tables. Usually, this is not a good idea for most applications. However, there are corner case, where autovacuum_enabled = off makes sense. Just consider a table whose life cycle is very short. It does not make sense to clean out tuples if the developer already knows that the entire table will be dropped within seconds. In data warehousing, this can be the case if you use tables as staging areas. VACUUM is turned off in this example to ensure that nothing happens in the background; all you see is triggered by me and not by some process.

First of all, the size of the table is checked:

test=# SELECT pg_size_pretty(pg_relation_size('t_test')); 
pg_size_pretty
----------------
 3544 kB
(1 row)

pg_relation_size returns the size of a table in bytes. pg_size_pretty will take this number and turn it into something human-readable.

Then, all rows in the table will be updated:

test=# UPDATE t_test SET id = id + 1;  
UPDATE 100000

What happens is highly important to understand PostgreSQL; the database engine has to copy all the rows. Why? First of all, we don't know whether the transaction will be successful, so the data cannot be overwritten. The second important aspect is that a concurrent transaction might still be seeing the old version of the data.

The UPDATE operation will copy rows.

Logically, the size of the table will be larger after the change has been made:

test=# SELECT pg_size_pretty(pg_relation_size('t_test'));
 pg_size_pretty 
----------------
 7080 kB
(1 row)

After UPDATE, people might try to return space to the filesystem:

test=# VACUUM t_test;  
VACUUM

As stated previously, VACUUM does not return space to the filesystem in most cases. Instead, it will allow space to be reused. The table, therefore, does not shrink at all:

test=# SELECT pg_size_pretty(pg_relation_size('t_test')); 
 pg_size_pretty 
----------------
 7080 kB
(1 row)

However, the next UPDATE will not make the table grow because it will eat the free space inside the table. Only a second UPDATE would make the table grow again, because all the space is gone and so additional storage is needed:

test=# UPDATE t_test SET id = id + 1;  
UPDATE 100000 
test=# SELECT pg_size_pretty(pg_relation_size('t_test')); 
 pg_size_pretty 
----------------
 7080 kB
(1 row)

test=# UPDATE t_test SET id = id + 1;  
UPDATE 100000 
test=# SELECT pg_size_pretty(pg_relation_size('t_test'));
 pg_size_pretty 
----------------
 10 MB
(1 row)

If I had to decide on a single thing, you should remember after reading this book, this is it. Understanding storage is the key to performance and administration in general.

Let us run some more queries:

VACUUM t_test; 
UPDATE t_test SET id = id + 1;  
VACUUM t_test;

Again the size is unchanged. Let's see what is inside the table:

test=# SELECT ctid, * FROM t_test ORDER BY ctid DESC;
ctid       | id
-----------+-------- 
(1327, 46) | 112
(1327, 45) | 111
(1327, 44) | 110
(884, 20) | 99798
(884, 19) | 99797
...

ctid is the physical position of a row on a disk. Using ORDER BY ctid DESC, you will basically read the table backwards in the physical order. Why should you care? The reason is that there are some very small values and some very big values at the end of the table. What happens if they are deleted?

test=# DELETE FROM t_test WHERE id > 99000 OR id < 1000;  
DELETE 1999 
test=# VACUUM t_test;  
VACUUM 
test=# SELECT pg_size_pretty(pg_relation_size('t_test')); 
 pg_size_pretty 
----------------
 3504 kB
(1 row)

Although only 2% of the data has been deleted, the size of the table has gone down by two thirds. The reason is that if VACUUM only finds dead rows after a certain position in the table, it can return space to the filesystem. This is the only case in which you will actually see the table size go down. Of course, normal users have no control over the physical position of data on the disk. Therefore, storage consumption will most likely stay somewhat the same unless all rows are deleted.

Why are there so many small and big values at the end of the table anyway? After the table is initially populated with 100,000 rows, the last block is not completely full, so the first UPDATE will fill up the last block with changes. This naturally shuffles the end of the table a bit. In this carefully crafted example, this is the reason for the strange layout at the end of the table.

In real-world applications, the impact of this observation cannot be stressed enough. There is no performance tuning without really understanding storage.

官术网_书友最值得收藏!

Mastering PostgreSQL 10

Watching VACUUM at work