官术网_书友最值得收藏!

Making streaming-only replication more robust

The first thing a slave has to do when connecting to a master is to play catch up. But can this always work? We have already seen that we can use a mixed setup consisting of a streaming-based and a file-based component. This gives us some extra security if streaming does not work.

In many real-world scenarios, two ways of transporting the XLOG might be too complicated. In many cases, it is enough to have just streaming. The point is that in a normal setup, as described already, the master can throw the XLOG away as soon as it is not needed to repair the master anymore. Depending on your checkpoint configuration, the XLOG might be around for quite a while or only a short time. The trouble is that if your slave connects to the master, it might happen that the desired XLOG is not around anymore. The slave cannot resync itself in this scenario. You might find this a little annoying, because it implicitly limits the maximum downtime of your slave to your master's checkpoint behavior.

Two choices are available to solve the problem:

  • wal_keep_segments: Keep some XLOG files on the master
  • Physical replication slots: Teach the master to recycle the XLOG only when it has been consumed

Using wal_keep_segments

To make your setup much more robust, we recommend making heavy use of wal_keep_segments. The idea of this postgresql.conf setting (on the master) is to teach the master to keep more XLOG files around than theoretically necessary. If you set this variable to 1000, it essentially means that the master will keep 16 GB of more XLOG than needed. In other words, your slave can be gone for 16 GB (in terms of changes to the master) longer than usual. This greatly increases the odds that a slave can join the cluster without having to completely resync itself from scratch. For a 500 MB database, this is not worth mentioning, but if your setup has to hold hundreds of gigabytes or terabytes, it becomes an enormous advantage. Producing a base backup of a 20 TB instance is a lengthy process. You might not want to do this too often, and you definitely won't want to do this over and over again.

Tip

If you want to update a large base backup, it might be beneficial to incrementally update it using rsync and the traditional method of taking base backups.

What are the reasonable values for wal_keep_segments? As always, this largely depends on your workloads. From experience, we can tell that a multi-GB implicit archive on the master is definitely an investment worth considering. Very low values for wal_keep_segments might be risky and not worth the effort. Nowadays, pace is usually cheap. Small systems might not need this setting, and large ones should have sufficient spare capacity to absorb the extra requirements. Personally, I am always in favor of using at least some extra XLOG segments.

Utilizing replication slots

With the introduction of PostgreSQL 9.4, a more sophisticated solution to the problem of deleted XLOG has been introduced—physical replication slots. As already outlined earlier in this book, replication slots make sure that the master deletes XLOG only when it has been safely consumed by the replica. In the case of the cleanup problem outlined in this section, this is exactly what is needed here.

The question now is: how can a replication slot be used? Basically, it is very simple. All that has to be done is create the replication slot on the master and tell the slave which slots to use through recovery.conf.

Here is how it works on the master:

postgres=# SELECT * FROM pg_create_physical_replication_slot('repl_slot);
 slot_name | xlog_position
-------------+---------------
 repl_slot |

postgres=# SELECT * FROM pg_replication_slots;
 slot_name | slot_type | datoid | database | active | xmin | restart_lsn
-------------+-----------+--------+----------+--------+------+-------------
 repl_slot | physical | | | f | |
(1 row)

Once the base backup has happened, the slave can be configured easily:

standby_mode = 'on'
primary_conninfo = 'host=master.postgresql-support.de port=5432 user=hans password=hanspass'
primary_slot_name = 'repl_slot'

The configuration is just as if there were no replication slots. The only change is that the primary_slot_name variable has been added. The slave will pass the name of the replication slot to the master, and the master knows when to recycle the transaction log. As mentioned already, if a slave is not in use anymore, make sure that the replication slot is properly deleted to avoid trouble on the master (running out of disk space and other troubles). The problem is that this is incredibly insidious. Slaves, being optional, are not always monitored as they should be. As such, it might be a good idea to recommend that you regularly compare pg_stat_replication with pg_replication_slots for mismatches worthy of further investigation.

主站蜘蛛池模板: 房山区| 探索| 烟台市| 远安县| 凌源市| 阿鲁科尔沁旗| 平原县| 永胜县| 汨罗市| 新疆| 右玉县| 阳曲县| 弋阳县| 柘城县| 神农架林区| 武威市| 宜兰市| 东兰县| 桂阳县| 荣成市| 万安县| 青冈县| 司法| 武安市| 东乡| 喜德县| 湖南省| 灵璧县| 农安县| 牟定县| 东平县| 山丹县| 西乡县| 遵义市| 湘潭市| 永善县| 富阳市| 北辰区| 南雄市| 拜城县| 安仁县|