- PostgreSQL Replication(Second Edition)
- Hans Jürgen Sch?nig
- 1229字
- 2021-07-16 13:33:50
Experiencing the XLOG in action
We will use the transaction log throughout this book, and to give you a deeper insight into how things work on a technical level, we have added this section dealing exclusively with the internal workings of the XLOG machinery. We will avoid going down to the C level as this would be way beyond the scope of this book, but we will provide you with insights that are hopefully deep enough.
Understanding the XLOG records
Changes made to the XLOG are record-based. What does that mean? Let's assume you are adding a row to a table:
test=# INSERT INTO t_test VALUES (1, 'hans');
INSERT 0 1
In this example, we are inserting values into a table containing two columns. For the sake of this example, we want to assume that both columns are indexed.
Remember what you learned before: the purpose of the XLOG is to keep those data files safe. So this operation will trigger a series of XLOG entries. First, the data file (or files) related to the table will be written. Then the entries related to the indexes will be created. Finally, a COMMIT
record will be sent to the log.
Not all the XLOG records are equal. Various types of XLOG records exist, for example heap, B-tree, clog, storage, Generalized Inverted Index (GIN), and standby records, to name a few.
XLOG records are chained backwards so, each entry points to the previous entry in the file. In this way, we can be perfectly sure that we have found the end of a record as soon as we have found the pointer to the previous entry.
Making the XLOG deterministic
As you can see, a single change can trigger a larger number of XLOG entries. This is true for all kinds of statements; a large DELETE
statement, for instance, can easily cause a million changes. The reason is that PostgreSQL cannot simply put the SQL itself into the log; it really has to log physical changes made to the table.
Why is that so? Just consider the following example:
test=# DELETE FROM t_test WHERE id > random();
DELETE 5335
The random()
function has to produce a different output every time it is called, and therefore we cannot just put the SQL into the log because it is not guaranteed to provide us with the same outcome if it is executed during replay.
Making the XLOG reliable
The XLOG itself is one of the most critical and sensitive parts in the entire database instance. Therefore, we have to take special care to make sure that everything possible is done to protect it. In the event of a crash, a database instance is usually doomed if there is no XLOG around.
Internally, PostgreSQL takes some special precautions to handle the XLOG:
- Using CRC32 checksums
- Disabling signals
- Space allocation
First of all, each XLOG record contains a CRC32 checksum. This allows us to check the integrity of the log at startup. It is perfectly possible that the last write operations before a crash were not totally sound any more, and therefore a checksum can definitely help sort problems straightaway. The checksum is automatically computed by PostgreSQL and users don't have to take care of this feature explicitly.
In addition to checksums, PostgreSQL will disable signals temporarily while writing to the XLOG. This gives some extra level of security and reduces the odds of a stupid corner-case problem somewhere.
Finally, PostgreSQL uses a fixed-size XLOG. The size of the XLOG is determined by checkpoint segments as well as by checkpoint_completion_target
.
The size of the PostgreSQL transaction log is calculated as follows:
(2 + checkpoint_completion_target) * checkpoint_segments + 1
An alternative way to calculate the size is this:
checkpoint_segments + wal_keep_segments + 1 files
An important thing to note is that if something is of fixed size, it can rarely run out of space.
Tip
In the case of transaction-log-based replication, we can run out of space on the XLOG directory if the transaction log cannot be archived.
You will learn more about this topic in the next chapter.
LSNs and shared buffer interaction
If you want to repair a table, you have to make sure that you do so in the correct order; it would be a disaster if a row was deleted before it actually came into existence. Therefore, the XLOG provides you with the order of all the changes. Internally, this order is reflected through the Logical Sequence Number (LSN). The LSN is essential to the XLOG. Each XLOG entry will be assigned an LSN straight away.
In one of the previous sections, we discussed consistency level. With synchronous_commit
set to off
, a client will get an approval even if the XLOG record has not been flushed to disk yet. Still, since a change must be reflected in cache and since the XLOG must be written before the data table, the system has to make sure that not all the blocks in the shared buffer can be written to instantly. The LSN will guarantee that we can write blocks from the shared buffer to the data file only if the corresponding change has already made it to the XLOG. Writing to the XLOG is fundamental, and a violation of this rule will certainly lead to problems after a crash.
Debugging the XLOG and putting it all together
Now that we have seen how an XLOG basically works, we can put it all together and actually look into the XLOG. As of PostgreSQL 9.2, it works as follows: we have to compile PostgreSQL from source. Before we do that, we should modify the file located at src/include/pg_config_manual.h
. At approximately line 250, we can uncomment WAL_DEBUG
and compile as usual. This will then allow us to set a client variable called wal_debug
:
test=# SET client_min_messages TO log; SET test=# SET wal_debug TO on; SET
In addition to this, we have to set client_min_messages
to make sure that the LOG
messages will reach our client.
We are using the following table structure for our test:
test=# \d t_test
Table "public.t_test"
Column | Type | Modifiers
--------+---------+-----------
id | integer |
name | text |
Indexes:
"idx_id"btree (id)
"idx_name"btree (name)
Only if PostgreSQL has been compiled properly will we see some information about the XLOG on the screen:
test=# INSERT INTO t_test VALUES (1, 'hans');
LOG: INSERT @ 0/17C4680: prev 0/17C4620; xid 1009; len 36: Heap - insert(init): rel 1663/16384/16394; tid 0/1
LOG: INSERT @ 0/17C46C8: prev 0/17C4680; xid 1009; len 20: Btree - newroot: rel 1663/16384/16397; root 1 lev 0
LOG: INSERT @ 0/17C4700: prev 0/17C46C8; xid 1009; len 34: Btree - insert: rel 1663/16384/16397; tid 1/1
LOG: INSERT @ 0/17C4748: prev 0/17C4700; xid 1009; len 20: Btree - newroot: rel 1663/16384/16398; root 1 lev 0
LOG: INSERT @ 0/17C4780: prev 0/17C4748; xid 1009; len 34: Btree - insert: rel 1663/16384/16398; tid 1/1
LOG: INSERT @ 0/17C47C8: prev 0/17C4780; xid 1009; len 12: Transaction - commit: 2013-02-25 18:20:46.633599+01
LOG: XLOG flush request 0/17C47F8; write 0/0; flush 0/0
Just as stated in this chapter, PostgreSQL will first add a row to the table itself (heap). Then the XLOG will contain all entries that are index-related. Finally, a commit record will be added.
In all, 156 bytes have made it to the XLOG. This is far more than the data we have actually added. Consistency, performance (indexes), and reliability come with a price.
- Java面向?qū)ο笏枷肱c程序設(shè)計(jì)
- HTML5+CSS3+JavaScript Web開(kāi)發(fā)案例教程(在線實(shí)訓(xùn)版)
- Rust Essentials(Second Edition)
- Java Web程序設(shè)計(jì)任務(wù)教程
- Swift細(xì)致入門(mén)與最佳實(shí)踐
- R語(yǔ)言數(shù)據(jù)可視化:科技圖表繪制
- 微信小程序開(kāi)發(fā)實(shí)戰(zhàn):設(shè)計(jì)·運(yùn)營(yíng)·變現(xiàn)(圖解案例版)
- Learning TypeScript
- Enterprise Application Architecture with .NET Core
- Web前端開(kāi)發(fā)精品課:HTML5 Canvas開(kāi)發(fā)詳解
- HTML5+jQuery Mobile移動(dòng)應(yīng)用開(kāi)發(fā)
- HTML 5與CSS 3權(quán)威指南(第4版·下冊(cè))
- Mastering Android NDK
- Drupal 8 Module Development
- Hadoop實(shí)戰(zhàn)