官术网_书友最值得收藏!

Experiencing the XLOG in action

We will use the transaction log throughout this book, and to give you a deeper insight into how things work on a technical level, we have added this section dealing exclusively with the internal workings of the XLOG machinery. We will avoid going down to the C level as this would be way beyond the scope of this book, but we will provide you with insights that are hopefully deep enough.

Understanding the XLOG records

Changes made to the XLOG are record-based. What does that mean? Let's assume you are adding a row to a table:

test=# INSERT INTO t_test VALUES (1, 'hans');
INSERT 0 1

In this example, we are inserting values into a table containing two columns. For the sake of this example, we want to assume that both columns are indexed.

Remember what you learned before: the purpose of the XLOG is to keep those data files safe. So this operation will trigger a series of XLOG entries. First, the data file (or files) related to the table will be written. Then the entries related to the indexes will be created. Finally, a COMMIT record will be sent to the log.

Not all the XLOG records are equal. Various types of XLOG records exist, for example heap, B-tree, clog, storage, Generalized Inverted Index (GIN), and standby records, to name a few.

XLOG records are chained backwards so, each entry points to the previous entry in the file. In this way, we can be perfectly sure that we have found the end of a record as soon as we have found the pointer to the previous entry.

Making the XLOG deterministic

As you can see, a single change can trigger a larger number of XLOG entries. This is true for all kinds of statements; a large DELETE statement, for instance, can easily cause a million changes. The reason is that PostgreSQL cannot simply put the SQL itself into the log; it really has to log physical changes made to the table.

Why is that so? Just consider the following example:

test=# DELETE FROM t_test WHERE id > random();
DELETE 5335

The random() function has to produce a different output every time it is called, and therefore we cannot just put the SQL into the log because it is not guaranteed to provide us with the same outcome if it is executed during replay.

Making the XLOG reliable

The XLOG itself is one of the most critical and sensitive parts in the entire database instance. Therefore, we have to take special care to make sure that everything possible is done to protect it. In the event of a crash, a database instance is usually doomed if there is no XLOG around.

Internally, PostgreSQL takes some special precautions to handle the XLOG:

  • Using CRC32 checksums
  • Disabling signals
  • Space allocation

First of all, each XLOG record contains a CRC32 checksum. This allows us to check the integrity of the log at startup. It is perfectly possible that the last write operations before a crash were not totally sound any more, and therefore a checksum can definitely help sort problems straightaway. The checksum is automatically computed by PostgreSQL and users don't have to take care of this feature explicitly.

In addition to checksums, PostgreSQL will disable signals temporarily while writing to the XLOG. This gives some extra level of security and reduces the odds of a stupid corner-case problem somewhere.

Finally, PostgreSQL uses a fixed-size XLOG. The size of the XLOG is determined by checkpoint segments as well as by checkpoint_completion_target.

The size of the PostgreSQL transaction log is calculated as follows:

(2 + checkpoint_completion_target) * checkpoint_segments + 1

An alternative way to calculate the size is this:

checkpoint_segments + wal_keep_segments + 1 files

An important thing to note is that if something is of fixed size, it can rarely run out of space.

Tip

In the case of transaction-log-based replication, we can run out of space on the XLOG directory if the transaction log cannot be archived.

You will learn more about this topic in the next chapter.

LSNs and shared buffer interaction

If you want to repair a table, you have to make sure that you do so in the correct order; it would be a disaster if a row was deleted before it actually came into existence. Therefore, the XLOG provides you with the order of all the changes. Internally, this order is reflected through the Logical Sequence Number (LSN). The LSN is essential to the XLOG. Each XLOG entry will be assigned an LSN straight away.

In one of the previous sections, we discussed consistency level. With synchronous_commit set to off, a client will get an approval even if the XLOG record has not been flushed to disk yet. Still, since a change must be reflected in cache and since the XLOG must be written before the data table, the system has to make sure that not all the blocks in the shared buffer can be written to instantly. The LSN will guarantee that we can write blocks from the shared buffer to the data file only if the corresponding change has already made it to the XLOG. Writing to the XLOG is fundamental, and a violation of this rule will certainly lead to problems after a crash.

Debugging the XLOG and putting it all together

Now that we have seen how an XLOG basically works, we can put it all together and actually look into the XLOG. As of PostgreSQL 9.2, it works as follows: we have to compile PostgreSQL from source. Before we do that, we should modify the file located at src/include/pg_config_manual.h. At approximately line 250, we can uncomment WAL_DEBUG and compile as usual. This will then allow us to set a client variable called wal_debug:

test=# SET client_min_messages TO log;
SET
test=# SET wal_debug TO on;
SET

In addition to this, we have to set client_min_messages to make sure that the LOG messages will reach our client.

We are using the following table structure for our test:

test=# \d t_test
    Table "public.t_test"
 Column |  Type   | Modifiers
--------+---------+-----------
id      | integer |
name    | text    |
Indexes:
"idx_id"btree (id)
"idx_name"btree (name)

Only if PostgreSQL has been compiled properly will we see some information about the XLOG on the screen:

test=# INSERT INTO t_test VALUES (1, 'hans');
LOG:  INSERT @ 0/17C4680: prev 0/17C4620; xid 1009; len 36: Heap - insert(init): rel 1663/16384/16394; tid 0/1
LOG:  INSERT @ 0/17C46C8: prev 0/17C4680; xid 1009; len 20: Btree - newroot: rel 1663/16384/16397; root 1 lev 0
LOG:  INSERT @ 0/17C4700: prev 0/17C46C8; xid 1009; len 34: Btree - insert: rel 1663/16384/16397; tid 1/1
LOG:  INSERT @ 0/17C4748: prev 0/17C4700; xid 1009; len 20: Btree - newroot: rel 1663/16384/16398; root 1 lev 0
LOG:  INSERT @ 0/17C4780: prev 0/17C4748; xid 1009; len 34: Btree - insert: rel 1663/16384/16398; tid 1/1
LOG:  INSERT @ 0/17C47C8: prev 0/17C4780; xid 1009; len 12: Transaction - commit: 2013-02-25 18:20:46.633599+01
LOG:  XLOG flush request 0/17C47F8; write 0/0; flush 0/0

Just as stated in this chapter, PostgreSQL will first add a row to the table itself (heap). Then the XLOG will contain all entries that are index-related. Finally, a commit record will be added.

In all, 156 bytes have made it to the XLOG. This is far more than the data we have actually added. Consistency, performance (indexes), and reliability come with a price.

主站蜘蛛池模板: 绥江县| 时尚| 滨州市| 兴宁市| 海兴县| 宾阳县| 克什克腾旗| 乡宁县| 崇左市| 广安市| 武安市| 连城县| 临沭县| 莱西市| 衡南县| 桑植县| 出国| 交口县| 双江| 双江| 麟游县| 临猗县| 临猗县| 留坝县| 临桂县| 五大连池市| 大同市| 曲阳县| 丹棱县| 岳阳市| 河北省| 平利县| 嘉鱼县| 全椒县| 久治县| 宣恩县| 基隆市| 台前县| 汉中市| 临西县| 兰考县|