- Mastering PostgreSQL 10
- Hans Jürgen Sch?nig
- 852字
- 2021-06-30 19:03:52
Watching VACUUM at work
After this introduction, it is time to see VACUUM in action. I have included this section here because my practical work as a PostgreSQL consultant and supporter (http://postgresql-support.de/) indicates that most people only have a very vague understanding of what happens on the storage side.
To stress this point again, in most cases, VACUUM will not shrink your tables; space is usually not returned to the filesystem.
Here is my example:
CREATE TABLE t_test (id int) WITH (autovacuum_enabled = off); INSERT INTO t_test SELECT * FROM generate_series(1, 100000);
The idea is to create a simple table containing 100,000 rows. Note that it is possible to turn autovacuum off for specific tables. Usually, this is not a good idea for most applications. However, there are corner case, where autovacuum_enabled = off makes sense. Just consider a table whose life cycle is very short. It does not make sense to clean out tuples if the developer already knows that the entire table will be dropped within seconds. In data warehousing, this can be the case if you use tables as staging areas. VACUUM is turned off in this example to ensure that nothing happens in the background; all you see is triggered by me and not by some process.
First of all, the size of the table is checked:
test=# SELECT pg_size_pretty(pg_relation_size('t_test')); pg_size_pretty
----------------
3544 kB
(1 row)
pg_relation_size returns the size of a table in bytes. pg_size_pretty will take this number and turn it into something human-readable.
Then, all rows in the table will be updated:
test=# UPDATE t_test SET id = id + 1;
UPDATE 100000
What happens is highly important to understand PostgreSQL; the database engine has to copy all the rows. Why? First of all, we don't know whether the transaction will be successful, so the data cannot be overwritten. The second important aspect is that a concurrent transaction might still be seeing the old version of the data.
The UPDATE operation will copy rows.
Logically, the size of the table will be larger after the change has been made:
test=# SELECT pg_size_pretty(pg_relation_size('t_test'));
pg_size_pretty
----------------
7080 kB
(1 row)
After UPDATE, people might try to return space to the filesystem:
test=# VACUUM t_test;
VACUUM
As stated previously, VACUUM does not return space to the filesystem in most cases. Instead, it will allow space to be reused. The table, therefore, does not shrink at all:
test=# SELECT pg_size_pretty(pg_relation_size('t_test'));
pg_size_pretty
----------------
7080 kB
(1 row)
However, the next UPDATE will not make the table grow because it will eat the free space inside the table. Only a second UPDATE would make the table grow again, because all the space is gone and so additional storage is needed:
test=# UPDATE t_test SET id = id + 1; UPDATE 100000
test=# SELECT pg_size_pretty(pg_relation_size('t_test'));
pg_size_pretty
----------------
7080 kB
(1 row)
test=# UPDATE t_test SET id = id + 1; UPDATE 100000
test=# SELECT pg_size_pretty(pg_relation_size('t_test'));
pg_size_pretty
----------------
10 MB
(1 row)
If I had to decide on a single thing, you should remember after reading this book, this is it. Understanding storage is the key to performance and administration in general.
Let us run some more queries:
VACUUM t_test; UPDATE t_test SET id = id + 1; VACUUM t_test;
Again the size is unchanged. Let's see what is inside the table:
test=# SELECT ctid, * FROM t_test ORDER BY ctid DESC;
ctid | id
-----------+--------
(1327, 46) | 112
(1327, 45) | 111
(1327, 44) | 110
(884, 20) | 99798
(884, 19) | 99797
...
ctid is the physical position of a row on a disk. Using ORDER BY ctid DESC, you will basically read the table backwards in the physical order. Why should you care? The reason is that there are some very small values and some very big values at the end of the table. What happens if they are deleted?
test=# DELETE FROM t_test WHERE id > 99000 OR id < 1000; DELETE 1999
test=# VACUUM t_test; VACUUM
test=# SELECT pg_size_pretty(pg_relation_size('t_test')); pg_size_pretty
----------------
3504 kB
(1 row)
Although only 2% of the data has been deleted, the size of the table has gone down by two thirds. The reason is that if VACUUM only finds dead rows after a certain position in the table, it can return space to the filesystem. This is the only case in which you will actually see the table size go down. Of course, normal users have no control over the physical position of data on the disk. Therefore, storage consumption will most likely stay somewhat the same unless all rows are deleted.
In real-world applications, the impact of this observation cannot be stressed enough. There is no performance tuning without really understanding storage.
- 基于C語言的程序設計
- 構建高質量的C#代碼
- Go Machine Learning Projects
- 會聲會影X5視頻剪輯高手速成
- 人工免疫算法改進及其應用
- Python Artificial Intelligence Projects for Beginners
- 21天學通Java
- 系統安裝與重裝
- Mastering Predictive Analytics with scikit:learn and TensorFlow
- 電腦故障排除與維護終極技巧金典
- 工業機器人操作
- Hands-On Microservices with C#
- PVCBOT零基礎機器人制作(第2版)
- 中老年人學數碼照片后期處理
- ASP.NET 4.0 MVC敏捷開發給力起飛