官术网_书友最值得收藏!

Making streaming-only replication more robust

The first thing a slave has to do when connecting to a master is to play catch up. But can this always work? We have already seen that we can use a mixed setup consisting of a streaming-based and a file-based component. This gives us some extra security if streaming does not work.

In many real-world scenarios, two ways of transporting the XLOG might be too complicated. In many cases, it is enough to have just streaming. The point is that in a normal setup, as described already, the master can throw the XLOG away as soon as it is not needed to repair the master anymore. Depending on your checkpoint configuration, the XLOG might be around for quite a while or only a short time. The trouble is that if your slave connects to the master, it might happen that the desired XLOG is not around anymore. The slave cannot resync itself in this scenario. You might find this a little annoying, because it implicitly limits the maximum downtime of your slave to your master's checkpoint behavior.

Two choices are available to solve the problem:

  • wal_keep_segments: Keep some XLOG files on the master
  • Physical replication slots: Teach the master to recycle the XLOG only when it has been consumed

Using wal_keep_segments

To make your setup much more robust, we recommend making heavy use of wal_keep_segments. The idea of this postgresql.conf setting (on the master) is to teach the master to keep more XLOG files around than theoretically necessary. If you set this variable to 1000, it essentially means that the master will keep 16 GB of more XLOG than needed. In other words, your slave can be gone for 16 GB (in terms of changes to the master) longer than usual. This greatly increases the odds that a slave can join the cluster without having to completely resync itself from scratch. For a 500 MB database, this is not worth mentioning, but if your setup has to hold hundreds of gigabytes or terabytes, it becomes an enormous advantage. Producing a base backup of a 20 TB instance is a lengthy process. You might not want to do this too often, and you definitely won't want to do this over and over again.

Tip

If you want to update a large base backup, it might be beneficial to incrementally update it using rsync and the traditional method of taking base backups.

What are the reasonable values for wal_keep_segments? As always, this largely depends on your workloads. From experience, we can tell that a multi-GB implicit archive on the master is definitely an investment worth considering. Very low values for wal_keep_segments might be risky and not worth the effort. Nowadays, pace is usually cheap. Small systems might not need this setting, and large ones should have sufficient spare capacity to absorb the extra requirements. Personally, I am always in favor of using at least some extra XLOG segments.

Utilizing replication slots

With the introduction of PostgreSQL 9.4, a more sophisticated solution to the problem of deleted XLOG has been introduced—physical replication slots. As already outlined earlier in this book, replication slots make sure that the master deletes XLOG only when it has been safely consumed by the replica. In the case of the cleanup problem outlined in this section, this is exactly what is needed here.

The question now is: how can a replication slot be used? Basically, it is very simple. All that has to be done is create the replication slot on the master and tell the slave which slots to use through recovery.conf.

Here is how it works on the master:

postgres=# SELECT * FROM pg_create_physical_replication_slot('repl_slot);
 slot_name | xlog_position
-------------+---------------
 repl_slot |

postgres=# SELECT * FROM pg_replication_slots;
 slot_name | slot_type | datoid | database | active | xmin | restart_lsn
-------------+-----------+--------+----------+--------+------+-------------
 repl_slot | physical | | | f | |
(1 row)

Once the base backup has happened, the slave can be configured easily:

standby_mode = 'on'
primary_conninfo = 'host=master.postgresql-support.de port=5432 user=hans password=hanspass'
primary_slot_name = 'repl_slot'

The configuration is just as if there were no replication slots. The only change is that the primary_slot_name variable has been added. The slave will pass the name of the replication slot to the master, and the master knows when to recycle the transaction log. As mentioned already, if a slave is not in use anymore, make sure that the replication slot is properly deleted to avoid trouble on the master (running out of disk space and other troubles). The problem is that this is incredibly insidious. Slaves, being optional, are not always monitored as they should be. As such, it might be a good idea to recommend that you regularly compare pg_stat_replication with pg_replication_slots for mismatches worthy of further investigation.

主站蜘蛛池模板: 宁国市| 利辛县| 策勒县| 漠河县| 曲靖市| 呈贡县| 曲阜市| 微山县| 永靖县| 澄城县| 桑日县| 泰顺县| 高陵县| 屏南县| 武鸣县| 南郑县| 丽水市| 开封县| 富蕴县| 潮安县| 万山特区| 滁州市| 武鸣县| 美姑县| 买车| 乌拉特中旗| 盐边县| 武定县| 开远市| 河池市| 方城县| 浏阳市| 东平县| 满洲里市| 施甸县| 开平市| 宁陵县| 松原市| 巧家县| 高唐县| 巴林右旗|