- Mastering PostgreSQL 9.6
- Hans Jurgen Schonig
- 539字
- 2021-07-09 19:57:19
Reducing space consumption
Indexing is nice and its main purpose is to speed up things as much as possible. As with all good stuff, indexing comes with a price tag: space consumption. To do its magic, an index has to store values in an organized fashion. If your table contains 10 million integer values, the index belonging to the table will logically contain those 10 million integer values.
A B-tree will contain a pointer to each row in the table, and so it is certainly not free of charge. To figure out how much space an index will need, you can ask psql using the \di+ command:
test=# \di+
List of relations
Schema | Name | Type | Owner | Table | Size
--------+------------+-------+-------+----------+-------
public | idx_cos | index | hs | t_random | 86 MB
public | idx_id | index | hs | t_test | 86 MB
public | idx_name | index | hs | t_test | 86 MB
public | idx_random | index | hs | t_random | 86 MB
(4 rows)
In my database, the staggering amount of 344 MB has been burned to store those indexes. Now, compare this to the amount of storage burned by the underlying tables:
test=# \d+
List of relations
Schema | Name | Type | Owner | Size
--------+---------------+----------+-------+------------
public | t_random | table | hs | 169 MB
public | t_test | table | hs | 169 MB
public | t_test_id_seq | sequence | hs | 8192 bytes
(3 rows)
The size of both tables combined is just 338 MB. In other words, our indexing needs more space than the actual data. In the real world, this is common and actually pretty likely. Recently I visited a Cybertec customer in Germany and I saw a database in which 64% of the database size was made up of indexes that were never used (not a single time over the period of months). So, over-indexing can be an issue just like under-indexing. Remember, those indexes don't just consume space. Every INSERT or UPDATE must maintain the values in the indexes as well. In extreme cases like our example, this vastly decreases write throughput.
If there are just a handful of different values in the table, partial indexes are a solution:
test=# DROP INDEX idx_name;
DROP INDEX
test=# CREATE INDEX idx_name ON t_test (name) WHERE name NOT IN ('hans', 'paul');
CREATE INDEX
In this case, the majority has been excluded from the index and a small, efficient index can be enjoyed:
test=# \di+ idx_name
List of relations
Schema | Name | Type | Owner | Table | Size
--------+----------+-------+-------+--------+-----------
public | idx_name | index | hs | t_test | 8192 bytes
(1 row)
Note that it only makes sense to exclude very frequent values that make up a large part of the table (at least 25% or so). Ideal candidates for partial indexes are gender (we assume that most people are male or female), nationality (assuming that most people in your country have the same nationality), and so on. Of course, applying this kind of trickery requires some deep knowledge of your data, but it certainly pays off.
- Mastering Mesos
- Hands-On Internet of Things with MQTT
- Mastering Proxmox(Third Edition)
- Managing Mission:Critical Domains and DNS
- Hands-On Neural Networks with Keras
- Windows XP中文版應(yīng)用基礎(chǔ)
- 空間傳感器網(wǎng)絡(luò)復(fù)雜區(qū)域智能監(jiān)測(cè)技術(shù)
- 城市道路交通主動(dòng)控制技術(shù)
- JSF2和RichFaces4使用指南
- 基于32位ColdFire構(gòu)建嵌入式系統(tǒng)
- Linux系統(tǒng)下C程序開(kāi)發(fā)詳解
- 實(shí)戰(zhàn)突擊
- 網(wǎng)絡(luò)規(guī)劃與設(shè)計(jì)
- Practical Autodesk AutoCAD 2021 and AutoCAD LT 2021
- 數(shù)據(jù)庫(kù)技術(shù):Access 2003 計(jì)算機(jī)網(wǎng)絡(luò)技術(shù)