- PostgreSQL Replication(Second Edition)
- Hans Jürgen Sch?nig
- 2084字
- 2021-07-16 13:33:51
Taking base backups
In the previous section, you saw that enabling archiving takes just a handful of lines and offers a great deal of flexibility. In this section, we will see how to create a so-called base backup, which can be used to apply XLOG later on. A base backup is an initial copy of data.
Tip
Keep in mind that the XLOG itself is more or less worthless. It is only useful in combination with the initial base backup.
In PostgreSQL, there are two main options to create an initial base backup:
- Using
pg_basebackup
- Traditional methods based on copy
/rsync
Note that pg_dump
cannot be used for a base backup because a binary copy of the data is required. The pg_dump
provides a textual representation of the data and not a binary one, so it is not feasible here.
The following two sections will explain in detail how a base backup can be created.
Using pg_basebackup
The first and most common method of creating a backup of an existing server is by running a command called pg_basebackup
, which was first introduced in PostgreSQL 9.1.0. Previously, other techniques that are described later in this book were available. Basically, pg_basebackup
is able to fetch a database base backup directly over a database connection. When executed on the slave, pg_basebackup
will connect to the database server of your choice, and copy all the data files in the data directory to your machine. There is no need to log in to the box anymore, and all it takes is one line of code to run it; pg_basebackup
will do all the rest for you.
In this example, we will assume that we want to take a base backup of a host called sample.postgresql-support.de
. The following steps must be performed:
- Modify
pg_hba.conf
to allow replication - Signal the master to take the
pg_hba.conf
changes into account - Call
pg_basebackup
Modifying pg_hba.conf
To allow remote boxes to log in to a PostgreSQL server and stream XLOG, you have to explicitly allow replication.
In PostgreSQL, there is a file called pg_hba.conf
that tells the server which boxes are allowed to connect using which type of credentials. Entire IP ranges can be allowed or simply discarded through pg_hba.conf
.
To enable replication, we have to add one line for each IP range we want to allow. The following listing contains an example of a valid configuration:
# TYPE DATABASE USER ADDRESS METHOD host replication all 192.168.0.34/32 md5
In this case, we allow replication connections starting from 192.168.0.34
. The IP range is identified by 32
(which simply represents a single server in our case). We have decided to use MD5 as our authentication method. This means that pg_basebackup
has to supply a password to the server. If you are doing this in a non-security-critical environment, using trust
as authentication method might also be an option.
Tip
What happens if you actually have a database called replication
in your system? Basically, setting the database to replication
will just configure your streaming behavior. If you want to put in rules dealing with the database called replication
, you have to quote the database name as follows: "replication"
. However, we strongly advise you not to do this kind of trickery to avoid confusion.
Signaling the master server
Once pg_hba.conf
has been changed, we can tell PostgreSQL to reload the configuration. There is no need to restart the database completely. We have three options to make PostgreSQL reload pg_hba.conf
:
- By running a SQL command:
SELECT pg_reload_conf();
- By sending a signal to the master:
kill –HUP 4711
(with4711
being the process ID of the master) - By calling
pg_ctl
:pg_ctl –D $PGDATA reload
(with$PGDATA
being the home directory of your database instance)
Once we have told the server acting as data source to accept streaming connections, we can move forward and run pg_basebackup
.
pg_basebackup – basic features
The pg_basebackup
tool is a very simple to use command-line tool for PostgreSQL. It has to be called on the target system, and will provide you with a ready to use base backup, which is ready to consume the transaction log for Point-in-time Recovery.
The syntax of pg_basebackup
is as follows:
hs@chantal:~$ pg_basebackup --help pg_basebackup takes a base backup of a running PostgreSQL server. Usage: pg_basebackup [OPTION]... Options controlling the output: -D, --pgdata=DIRECTORY receive base backup into directory -F, --format=p|t output format (plain (default), tar) -r, --max-rate=RATE maximum transfer rate to transfer data directory (in kB/s, or use suffix "k" or "M") -R, --write-recovery-conf write recovery.conf after backup -T, --tablespace-mapping=OLDDIR=NEWDIR relocate tablespace in OLDDIR to NEWDIR -x, --xlog include required WAL files in backup (fetch mode) -X, --xlog-method=fetch|stream include required WAL files with specified method --xlogdir=XLOGDIR location for the transaction log directory -z, --gzip compress tar output -Z, --compress=0-9 compress tar output with given compression level General options: -c, --checkpoint=fast|spread set fast or spread checkpointing -l, --label=LABEL set backup label -P, --progress show progress information -v, --verbose output verbose messages -V, --version output version information, then exit -?, --help show this help, then exit Connection options: -d, --dbname=CONNSTR connection string -h, --host=HOSTNAME database server host or socket directory -p, --port=PORT database server port number -s, --status-interval=INTERVAL time between status packets sent to server (in seconds) -U, --username=NAME connect as specified database user -w, --no-password never prompt for password -W, --password force password prompt (should happen automatically)
A basic call to pg_basebackup
would look like this:
chantal:dbhs$ pg_basebackup -D /target_directory \ -h sample.postgresql-support.de
In this example, we will fetch the base backup from sample.postgresql-support.de
, and put it into our local directory called /target_directory
. It just takes this simple line to copy an entire database instance to the target system.
When we create a base backup as shown in this section, pg_basebackup
will connect to the server and wait for a checkpoint to happen before the actual copy process is started. In this mode, this is necessary because the replay process will start exactly at this point in the XLOG. The problem is that it might take a while until a checkpoint kicks in; pg_basebackup
does not enforce a checkpoint on the source server straightaway so as to make sure that normal operations are not disturbed.
Tip
If you don't want to wait on a checkpoint, consider using --checkpoint=fast
. This will enforce a checkpoint and pg_basebackup
will start copying instantly.
By default, a plain base backup will be created. It will consist of all the files in directories found on the source server. If the base backup has to be stored on a tape, we recommend that you give –-format=t
a try. It will automatically create a tar
archive (maybe on a tape). If you want to move data to a tape, you can easily save an intermediate step in this way. When using tar
, it is usually quite beneficial to use it in combination with --gzip
to reduce the size of the base backup on the disk.
Tip
There is also a way to see a progress bar while performing the base backup, but we don't recommend using this option (--progress
) because it requires pg_basebackup
to determine the size of the source instance first, which can be costly.
Backup throttling
If your master has only a weak I/O subsystem, it can happen that a full blown base backup creates so much load on the master's I/O system that normal transactions begin to suffer. Performance goes down and latency goes up. This is clearly an undesirable thing. Therefore, pg_basebackup
allows backup speeds to be reduced. By reducing the throughput of the backup tool, the master will have more capacity left to serve normal requests.
To reduce the speed of the backup, you can use the -r (--max-rate)
command-line switch. The expected unit is in KB. However, megabytes are also possible. Just pass the desired throughput as a parameter and stretch the backup to make sure that your master server does not suffer from the backup. The optimal value for these settings highly depends on your hardware infrastructure and performance requirements, so administrators have to determine the right speed as needed.
pg_basebackup – self-sufficient backups
Usually, a base backup without XLOG is pretty worthless. This is because the base backup is taken while the master is fully operational. While the backup is taken, the storage files in the database instance might have been modified heavily. The purpose of the XLOG is to fix those potential issues in the data files reliably.
But what if we want to create a base backup that can live without (explicitly archived) XLOG? In such a case, we can use the --xlog-method=stream
option. If this option has been chosen, pg_basebackup
will not only copy the data as it is, but also stream the XLOG being created during the base backup to our destination server. This will provide us with just enough XLOG to allow us to start a base backup made directly. It is self-sufficient and does not need additional XLOG files. This is not Point-in-time Recovery, but it can come in handy if there is trouble. Having a base backup, which can be started right away, is usually a good thing and it comes at fairly low cost.
Tip
Note that --xlog-method=stream
will require two database connections to the source server, not just one. You have to keep that in mind when adjusting max_wal_senders
on the source server.
If you are planning to use Point-in-time Recovery and if there is absolutely no need to start the backup as it is, you can safely skip the XLOG and save some space this way (default mode).
Making use of traditional methods to create base backups
These days, pg_basebackup
is the most common way of getting an initial copy of a database server. This has not always been the case. Traditionally, a different method was used, which worked as follows:
- Call
SELECT pg_start_backup('some label');
. - Copy all data files to the remote box through
rsync
or any other means. - Run
SELECT pg_stop_backup();
.
The main advantage of this old method is that there is no need to open a database connection, and no need to configure the XLOG streaming infrastructure on the source server.
Another main advantage is that you can make use of features such as ZFS snapshots or similar means that can help reduce dramatically the amount of I/O needed to create an initial backup.
Tip
Once you have started pg_start_backup
, there is no need to hurry. It is not necessary, and not even desirable, to leave the backup mode soon. Nothing will happen if you are in backup mode for days. PostgreSQL will archive the transaction log as usual, and the user won't face any kind of downside. Of course, it is a bad habit not to close backups soon and properly. However, the way PostgreSQL works internally does not change when a base backup is running. There is nothing filling up, no disk I/O delayed, or anything of this sort. The pg_basebackup
tool just copies files to the slave, and therefore it is not invasive at all.
Tablespace issues
If you happen to use more than one tablespace, pg_basebackup
will handle this just fine, provided the filesystem layout on the target box is identical to the filesystem layout on the master. However, if your target system does not use the same filesystem layout, there will be a bit more work to do. Using the traditional way of doing the base backup might be beneficial in this case.
If you are using --format=t
(for tar
), you will be provided with one tar
file per tablespace.
Keeping an eye on the network bandwidth
In the previous section, it has already been not that a weak server can easily fall victim to fast backups consuming all of the I/O capacity. The solution to the problem is the --max-rate
setting. Throttling the backup is also highly recommended when using rsync as the desired backup method.
Tip
If you want to limit rsync
to, say, 20 MB/sec, you can simply use rsync --bwlimit=20000
. This will definitely cause creation of the base backup to take longer, but it will make sure that your client apps don't face problems. In general, we recommend a dedicated network which connects the master and slave to make sure that a base backup does not affect normal operations.
Of course, you can use any other tools to copy data and achieve similar results.
Tip
If you are using gzip
compression with –-gzip
, it can work as an implicit speed brake. However, this is mainly a workaround.
- Full-Stack Vue.js 2 and Laravel 5
- VMware虛擬化技術(shù)
- Getting Started with Gulp
- C語言程序設(shè)計教程
- 從零開始學(xué)C語言
- 基于Struts、Hibernate、Spring架構(gòu)的Web應(yīng)用開發(fā)
- 從零開始學(xué)Linux編程
- HTML5 APP開發(fā)從入門到精通(微課精編版)
- Python全棧數(shù)據(jù)工程師養(yǎng)成攻略(視頻講解版)
- C++寶典
- Quantum Computing and Blockchain in Business
- Practical Game Design with Unity and Playmaker
- Android Game Programming by Example
- SFML Game Development
- 零基礎(chǔ)學(xué)SQL(升級版)