官术网_书友最值得收藏!

Taking base backups

In the previous section, you saw that enabling archiving takes just a handful of lines and offers a great deal of flexibility. In this section, we will see how to create a so-called base backup, which can be used to apply XLOG later on. A base backup is an initial copy of data.

Tip

Keep in mind that the XLOG itself is more or less worthless. It is only useful in combination with the initial base backup.

In PostgreSQL, there are two main options to create an initial base backup:

  • Using pg_basebackup
  • Traditional methods based on copy /rsync

Note that pg_dump cannot be used for a base backup because a binary copy of the data is required. The pg_dump provides a textual representation of the data and not a binary one, so it is not feasible here.

The following two sections will explain in detail how a base backup can be created.

Using pg_basebackup

The first and most common method of creating a backup of an existing server is by running a command called pg_basebackup, which was first introduced in PostgreSQL 9.1.0. Previously, other techniques that are described later in this book were available. Basically, pg_basebackup is able to fetch a database base backup directly over a database connection. When executed on the slave, pg_basebackup will connect to the database server of your choice, and copy all the data files in the data directory to your machine. There is no need to log in to the box anymore, and all it takes is one line of code to run it; pg_basebackup will do all the rest for you.

In this example, we will assume that we want to take a base backup of a host called sample.postgresql-support.de. The following steps must be performed:

  1. Modify pg_hba.conf to allow replication
  2. Signal the master to take the pg_hba.conf changes into account
  3. Call pg_basebackup
Modifying pg_hba.conf

To allow remote boxes to log in to a PostgreSQL server and stream XLOG, you have to explicitly allow replication.

In PostgreSQL, there is a file called pg_hba.conf that tells the server which boxes are allowed to connect using which type of credentials. Entire IP ranges can be allowed or simply discarded through pg_hba.conf.

To enable replication, we have to add one line for each IP range we want to allow. The following listing contains an example of a valid configuration:

# TYPE  DATABASE     USER    ADDRESS           METHOD
host  replication  all     192.168.0.34/32      md5

In this case, we allow replication connections starting from 192.168.0.34. The IP range is identified by 32 (which simply represents a single server in our case). We have decided to use MD5 as our authentication method. This means that pg_basebackup has to supply a password to the server. If you are doing this in a non-security-critical environment, using trust as authentication method might also be an option.

Tip

What happens if you actually have a database called replication in your system? Basically, setting the database to replication will just configure your streaming behavior. If you want to put in rules dealing with the database called replication, you have to quote the database name as follows: "replication". However, we strongly advise you not to do this kind of trickery to avoid confusion.

Signaling the master server

Once pg_hba.conf has been changed, we can tell PostgreSQL to reload the configuration. There is no need to restart the database completely. We have three options to make PostgreSQL reload pg_hba.conf:

  • By running a SQL command: SELECT pg_reload_conf();
  • By sending a signal to the master: kill –HUP 4711 (with 4711 being the process ID of the master)
  • By calling pg_ctl: pg_ctl –D $PGDATA reload (with $PGDATA being the home directory of your database instance)

Once we have told the server acting as data source to accept streaming connections, we can move forward and run pg_basebackup.

pg_basebackup – basic features

The pg_basebackup tool is a very simple to use command-line tool for PostgreSQL. It has to be called on the target system, and will provide you with a ready to use base backup, which is ready to consume the transaction log for Point-in-time Recovery.

The syntax of pg_basebackup is as follows:

hs@chantal:~$ pg_basebackup --help
pg_basebackup takes a base backup of a running PostgreSQL server.

Usage:
 pg_basebackup [OPTION]...

Options controlling the output:
 -D, --pgdata=DIRECTORY receive base backup into directory
 -F, --format=p|t output format (plain (default), tar)
 -r, --max-rate=RATE maximum transfer rate to transfer data directory
 (in kB/s, or use suffix "k" or "M")
 -R, --write-recovery-conf
 write recovery.conf after backup
 -T, --tablespace-mapping=OLDDIR=NEWDIR
 relocate tablespace in OLDDIR to NEWDIR
 -x, --xlog include required WAL files in backup (fetch mode)
 -X, --xlog-method=fetch|stream
 include required WAL files with specified method
 --xlogdir=XLOGDIR location for the transaction log directory
 -z, --gzip compress tar output
 -Z, --compress=0-9 compress tar output with given compression level

General options:
 -c, --checkpoint=fast|spread
 set fast or spread checkpointing
 -l, --label=LABEL set backup label
 -P, --progress show progress information
 -v, --verbose output verbose messages
 -V, --version output version information, then exit
 -?, --help show this help, then exit

Connection options:
 -d, --dbname=CONNSTR connection string
 -h, --host=HOSTNAME database server host or socket directory
 -p, --port=PORT database server port number
 -s, --status-interval=INTERVAL
 time between status packets sent to server (in seconds)
 -U, --username=NAME connect as specified database user
 -w, --no-password never prompt for password
 -W, --password force password prompt (should happen automatically)

A basic call to pg_basebackup would look like this:

chantal:dbhs$ pg_basebackup -D /target_directory \
-h sample.postgresql-support.de

In this example, we will fetch the base backup from sample.postgresql-support.de, and put it into our local directory called /target_directory. It just takes this simple line to copy an entire database instance to the target system.

When we create a base backup as shown in this section, pg_basebackup will connect to the server and wait for a checkpoint to happen before the actual copy process is started. In this mode, this is necessary because the replay process will start exactly at this point in the XLOG. The problem is that it might take a while until a checkpoint kicks in; pg_basebackup does not enforce a checkpoint on the source server straightaway so as to make sure that normal operations are not disturbed.

Tip

If you don't want to wait on a checkpoint, consider using --checkpoint=fast. This will enforce a checkpoint and pg_basebackup will start copying instantly.

By default, a plain base backup will be created. It will consist of all the files in directories found on the source server. If the base backup has to be stored on a tape, we recommend that you give –-format=t a try. It will automatically create a tar archive (maybe on a tape). If you want to move data to a tape, you can easily save an intermediate step in this way. When using tar, it is usually quite beneficial to use it in combination with --gzip to reduce the size of the base backup on the disk.

Tip

There is also a way to see a progress bar while performing the base backup, but we don't recommend using this option (--progress) because it requires pg_basebackup to determine the size of the source instance first, which can be costly.

Backup throttling

If your master has only a weak I/O subsystem, it can happen that a full blown base backup creates so much load on the master's I/O system that normal transactions begin to suffer. Performance goes down and latency goes up. This is clearly an undesirable thing. Therefore, pg_basebackup allows backup speeds to be reduced. By reducing the throughput of the backup tool, the master will have more capacity left to serve normal requests.

To reduce the speed of the backup, you can use the -r (--max-rate) command-line switch. The expected unit is in KB. However, megabytes are also possible. Just pass the desired throughput as a parameter and stretch the backup to make sure that your master server does not suffer from the backup. The optimal value for these settings highly depends on your hardware infrastructure and performance requirements, so administrators have to determine the right speed as needed.

pg_basebackup – self-sufficient backups

Usually, a base backup without XLOG is pretty worthless. This is because the base backup is taken while the master is fully operational. While the backup is taken, the storage files in the database instance might have been modified heavily. The purpose of the XLOG is to fix those potential issues in the data files reliably.

But what if we want to create a base backup that can live without (explicitly archived) XLOG? In such a case, we can use the --xlog-method=stream option. If this option has been chosen, pg_basebackup will not only copy the data as it is, but also stream the XLOG being created during the base backup to our destination server. This will provide us with just enough XLOG to allow us to start a base backup made directly. It is self-sufficient and does not need additional XLOG files. This is not Point-in-time Recovery, but it can come in handy if there is trouble. Having a base backup, which can be started right away, is usually a good thing and it comes at fairly low cost.

Tip

Note that --xlog-method=stream will require two database connections to the source server, not just one. You have to keep that in mind when adjusting max_wal_senders on the source server.

If you are planning to use Point-in-time Recovery and if there is absolutely no need to start the backup as it is, you can safely skip the XLOG and save some space this way (default mode).

Making use of traditional methods to create base backups

These days, pg_basebackup is the most common way of getting an initial copy of a database server. This has not always been the case. Traditionally, a different method was used, which worked as follows:

  1. Call SELECT pg_start_backup('some label');.
  2. Copy all data files to the remote box through rsync or any other means.
  3. Run SELECT pg_stop_backup();.

The main advantage of this old method is that there is no need to open a database connection, and no need to configure the XLOG streaming infrastructure on the source server.

Another main advantage is that you can make use of features such as ZFS snapshots or similar means that can help reduce dramatically the amount of I/O needed to create an initial backup.

Tip

Once you have started pg_start_backup, there is no need to hurry. It is not necessary, and not even desirable, to leave the backup mode soon. Nothing will happen if you are in backup mode for days. PostgreSQL will archive the transaction log as usual, and the user won't face any kind of downside. Of course, it is a bad habit not to close backups soon and properly. However, the way PostgreSQL works internally does not change when a base backup is running. There is nothing filling up, no disk I/O delayed, or anything of this sort. The pg_basebackup tool just copies files to the slave, and therefore it is not invasive at all.

Tablespace issues

If you happen to use more than one tablespace, pg_basebackup will handle this just fine, provided the filesystem layout on the target box is identical to the filesystem layout on the master. However, if your target system does not use the same filesystem layout, there will be a bit more work to do. Using the traditional way of doing the base backup might be beneficial in this case.

If you are using --format=t (for tar), you will be provided with one tar file per tablespace.

Keeping an eye on the network bandwidth

In the previous section, it has already been not that a weak server can easily fall victim to fast backups consuming all of the I/O capacity. The solution to the problem is the --max-rate setting. Throttling the backup is also highly recommended when using rsync as the desired backup method.

Tip

If you want to limit rsync to, say, 20 MB/sec, you can simply use rsync --bwlimit=20000. This will definitely cause creation of the base backup to take longer, but it will make sure that your client apps don't face problems. In general, we recommend a dedicated network which connects the master and slave to make sure that a base backup does not affect normal operations.

Of course, you can use any other tools to copy data and achieve similar results.

Tip

If you are using gzip compression with –-gzip, it can work as an implicit speed brake. However, this is mainly a workaround.

主站蜘蛛池模板: 米泉市| 太仆寺旗| 五华县| 陆丰市| 普洱| 遂宁市| 双城市| 福贡县| 六安市| 沾益县| 博客| 南靖县| 德格县| 军事| 乌拉特前旗| 安徽省| 嘉黎县| 黄陵县| 河西区| 饶阳县| 宝山区| 齐河县| 阳信县| 莱阳市| 治县。| 吉水县| 左权县| 乌拉特中旗| 大同市| 新晃| 翁源县| 德清县| 泽库县| 河源市| 双江| 湄潭县| 舟山市| 曲麻莱县| 湘潭市| 天镇县| 沙湾县|