网络用语牛马

書名： PostgreSQL Replication（Second Edition）
作者名： Hans Jürgen Sch?nig
本章字數(shù)： 821字
更新時間： 2021-07-16 13:33:53

Conflict management

In PostgreSQL, the streaming replication data flows in one direction only. The XLOG is provided by the master to a handful of slaves, which consume the transaction log and provide you with a nice copy of the data. You might wonder how this could ever lead to conflicts. Well, there can be conflicts.

Consider the following scenario: as you know, data is replicated with a very small delay. So, the XLOG ends up at the slave after it has been made on the master. This tiny delay can cause the scenario shown in the following diagram:

Let's assume that a slave starts to read a table. It is a long read operation. In the meantime, the master receives a request to actually drop the table. This is a bit of a problem, because the slave will still need this data to perform its SELECT statement. On the other hand, all the requests coming from the master have to be obeyed under any circumstances. This is a classic conflict.

Tip

In the event of a conflict, PostgreSQL will issue the Terminating connection due to conflict with recovery error message.

There are two ways to solve the problem:

Don't replay the conflicting transaction log before the slave has terminated the operation in question
Kill the query on the slave to resolve the problem

The first option might lead to ugly delays during the replay process, especially if the slave performs fairly long operations. The second option might frequently kill queries on the slave. The database instance cannot know by itself what is best for your application, so you have to find a proper balance between delaying the replay and killing queries.

To find this delicate balance, PostgreSQL offers two parameters in postgresql.conf:

max_standby_archive_delay = 30s
   # max delay before canceling queries
   # when reading WAL from archive;
   # -1 allows indefinite delay
max_standby_streaming_delay = 30s
   # max delay before canceling queries
   # when reading streaming WAL;
   # -1 allows indefinite delay

The max_standby_archive_delay parameter will tell the system how long to suspend the XLOG replay when there is a conflicting operation. In the default setting, the slave will delay the XLOG replay for up to 30 seconds if a conflict is found. This setting is valid if the slave is replaying the transaction log from the files.

The max_standby_streaming_delay parameter tells the slave how long it should suspend the XLOG replay if the XLOG is coming in through streaming. If the time has expired and the conflict is still there, PostgreSQL will cancel the statement due to a problem with recovery, causing a problem in the slave system and resuming the XLOG recovery to catch up. These settings cover a cumulative delay. That is, if there are ten queries pending, they don't get 30 seconds each to delay replication. So, a query might run for 10 milliseconds and get canceled because it was unlucky to be at the end of an existing delay, causing the user to wonder what happened.

In the previous example, we have shown that a conflict may show up if a table is dropped. This is an obvious scenario; however, it is by no means the most common one. It is much more likely that a row is removed by VACUUM or HOT-UPDATE somewhere, causing conflicts on the slave.

Conflicts popping up once in a while can be really annoying and trigger bad behavior of your applications. In other words, if possible, conflicts should be avoided. We have already seen how replaying the XLOG can be delayed. These are not the only mechanisms provided by PostgreSQL. There are two more settings we can use.

The first, and older one of the two, is the setting called vacuum_defer_cleanup_age. It is measured in transactions, and tells PostgreSQL when to remove a line of data. Normally, a line of data can be removed by VACUUM if no more transactions can see the data anymore. The vacuum_defer_cleanup_age parameter tells VACUUM to not clean up a row immediately but wait for some more transactions before it can go away. Deferring cleanups will keep a row around a little longer than needed. This helps the slave to complete queries that are relying on old rows. Especially if your slave is the one handling some analytical work, this will help a lot in making sure that no queries have to die in vain.

One more method of controlling conflicts is by making use of hot_standby_feedback. The idea is that a slave reports transaction IDs to the master, which in turn can use this information to defer VACUUM. This is one of the easiest methods of avoiding cleanup conflicts on the slave.

Tip

Keep in mind, however, that deferring cleanups can lead to increased space consumption and some other side effects, which have to be kept in mind under any circumstances. The effect is basically the same as running a long transaction on the master.

官术网_书友最值得收藏!

PostgreSQL Replication（Second Edition）

Conflict management

Tip

Tip