官术网_书友最值得收藏!

Using indexes in an intelligent way

So far, applying an index feels like the Holy Grail, which always improves performance magically. However, this is not the case. Indexes can also be pretty pointless in some cases.

Before digging into things more deeply, here is the data structure we have used for this example. Remember that there are only two distinct names and unique IDs:

test=# \d t_test 
Table "public.t_test"
Column | Type | Modifiers
--------+---------+------------------------------------
id | integer | not null default nextval('t_test_id_seq'::regclass)
name | text |
Indexes:
"idx_id" btree (id)

At this point, one index has been defined, which covers the id column. In the next step, the name column will be queried. Before doing this, an index on the name will be created:

test=# CREATE INDEX idx_name ON t_test (name);  
CREATE INDEX 

Now, it is time to see if the index is used correctly:

test=# EXPLAIN SELECT * FROM t_test WHERE name = 'hans2'; 
QUERY PLAN
-----------------------------------------------------
Index Scan using idx_name on t_test
(cost=0.43..4.45 rows=1 width=9)
Index Cond: (name = 'hans2'::text)
(2 rows)

As expected, PostgreSQL will decide on using the index. Most users would expect this. But note that my query says hans2. Remember, hans2 does not exist in the table and the query plan perfectly reflects this. rows=1 indicates that the planner only expects a very small subset of data being returned by the query.

There is not a single row in the table, but PostgreSQL will never estimate zero rows because it would make subsequent estimations a lot harder because useful cost calculations of other nodes in the plan would be close to impossible.

Let's see what happens if we look for more data:

test=# EXPLAIN SELECT * 
FROM t_test
WHERE name = 'hans'
OR name = 'paul';
QUERY PLAN
----------------------------------------------------------
Seq Scan on t_test (cost=0.00..81622.00 rows=3000011 width=9)
Filter: ((name = 'hans'::text) OR (name = 'paul'::text))
(2 rows)

In this case, PostgreSQL will go for a straight sequential scan. Why is that? Why is the system ignoring all indexes? The reason is simple; hans and paul make up the entire dataset because there are no other values (PostgreSQL knows that by checking the system statistics). Therefore, PostgreSQL figures that the entire table has to be read anyway. There is no reason to read all of the index and the full table if reading just the table is sufficient.

In other words, PostgreSQL will not use an index just because there is one. PostgreSQL will use indexes when they make sense. If the number of rows is smaller, PostgreSQL will again consider bitmap scans and normal index scans:

test=# EXPLAIN SELECT * 
FROM t_test
WHERE name = 'hans2'
OR name = 'paul2';


QUERY PLAN
----------------------------------------------------------
Bitmap Heap Scan on t_test (cost=8.88..12.89 rows=1 width=9)
Recheck Cond: ((name = 'hans2'::text) OR (name = 'paul2'::text))
-> BitmapOr (cost=8.88..8.88 rows=1 width=0)
-> Bitmap Index Scan on idx_name
(cost=0.00..4.44 rows=1 width=0)

Index Cond: (name = 'hans2'::text)
-> Bitmap Index Scan on idx_name
(cost=0.00..4.44 rows=1 width=0)

Index Cond: (name = 'paul2'::text)

The most important point to learn here is that execution plans depend on input values.

They are not static and not independent of the data inside the table. This is a very important observation, which has to be kept in mind all the time. In real-world examples, the fact that plans change can often be the reason for unpredictable runtimes.

主站蜘蛛池模板: 隆化县| 蛟河市| 新民市| 阿克苏市| 镶黄旗| 余江县| 明溪县| 柘荣县| 桐柏县| 浦城县| 元朗区| 临澧县| 通榆县| 陆河县| 张家川| 黑河市| 凯里市| 聊城市| 乐平市| 察隅县| 唐海县| 江陵县| 大港区| 苍山县| 六枝特区| 曲阜市| 灵山县| 连山| 祁东县| 鹿邑县| 台湾省| 罗山县| 常宁市| 九江市| 南陵县| 梁平县| 禄丰县| 台湾省| 滦南县| 长子县| 清河县|