官术网_书友最值得收藏!

Grouping columns the right way

Some people might wonder what is meant by the title of this section. Does it make a difference which order columns are aligned in? This might be a bit surprising, but it does. Even if a table contains the same data, its size on the disk might vary depending on the order of columns. Here is an example:

test=# CREATE TABLE t_test (  i1 int,
          i2 int,
          i3 int,
          v1 varchar(100),
          v2 varchar(100),
          v3 varchar(100)
);
CREATE TABLE
test=# INSERT INTO t_test SELECT 10, 20, 30,
  'abcd', 'abcd', 'abcd'
  FROM generate_series(1, 10000000);
INSERT 0 10000000

A table with three columns has been created. First of all, there are three integer columns. Then some varchar columns are added. In the second statement, 10 million rows are added. The generate_series command is a nice way to generate a list of numbers. In this example, the output of generate_series is not used. I am just utilizing the function to repeat the static data in the SELECT clause.

Now the size of the table can be checked:

test=# SELECT pg_size_pretty(pg_relation_size('t_test'));
 pg_size_pretty
----------------
 574 MB
(1 row)

The pg_relation_size returns the size of a table in bytes. In the case of a large relation, this is somewhat unhandy because users may easily end up with very large, hard-to-read numbers. Therefore, the return value of pg_relation_size should be wrapped into pg_size_pretty, which makes the size a lot more useful, as data is returned in a more human-readable format.

Let's try the same example again. This time, however, the order of the columns is changed. The rest of the example stays exactly the same:

test=# CREATE TABLE t_test (  v1 varchar(100),
          i1 int,
          v2 varchar(100),
          i2 int,
          v3 varchar(100),
          i3 int
);
CREATE TABLE
test=# INSERT INTO t_test SELECT 'abcd', 10, 'abcd',
      20, 'abcd', 30
  FROM generate_series(1, 10000000);
INSERT 0 10000000
test=# SELECT pg_size_pretty(pg_relation_size('t_test'));
 pg_size_pretty
----------------
 651 MB
(1 row)

The table has grown considerably, even though the data inside the table is exactly the same. The reason for this problem is called alignment and can be found deep inside the code. The theory is as follows: a CPU has a hard time if a field does not start at a multiplier of CPU word-size. Therefore, PostgreSQL will accordingly align data physically.

The most important point here is that it can make sense to group columns with similar data types next to each other. Of course, the outcome and the potential difference in size greatly depend on the content. If "abc" was used instead of "abcd" in this example, the results would not have shown any difference; both versions of the table would have had a fixed size of 498 MB.

Tip

Note that different types of CPUs (x86_32, x86_64, Sparc, ARM, and so on.) may not lead to the same results here.

主站蜘蛛池模板: 太仓市| 阳山县| 韩城市| 桦南县| 高雄市| 漳浦县| 于都县| 渭源县| 屯门区| 沅陵县| 婺源县| 肥东县| 高平市| 湟源县| 沾化县| 呼伦贝尔市| 福清市| 博乐市| 石渠县| 东宁县| 安徽省| 封开县| 吴桥县| 淮阳县| 沙河市| 南昌县| 兴仁县| 永清县| 灌云县| 余江县| 宜章县| 曲靖市| 张掖市| 嘉峪关市| 巴林左旗| 安达市| 西宁市| 彩票| 北票市| 鸡东县| 马鞍山市|