官术网_书友最值得收藏!

Making use of replication slots

In PostgreSQL 9.4, a major new feature called "replication slots" has been introduced. The idea is to give users and tools alike a chance to connect to the transaction log stream in a standard way and consume data.

Basically, two types of replication slots exist:

  • Physical replication slots
  • Logical replication slots

The following two sections describe those two types of replication slots in detail.

Physical replication slots

Physical replication slots are an important new feature of PostgreSQL 9.4. The idea is that a client can create a replication slot to make sure that the server only discards what has really made it to the client. Remember that in PostgreSQL, the transaction log is normally recycled as soon as a certain amount of new XLOG has been created. To the streaming replication slaves or some other clients, this can turn out to be disastrous, because if the slave/client cannot consume the transaction log fast enough or if there is simply not enough bandwidth, it can happen that the master throws away stuff that is actually still needed in the future. By giving clients an actual name, the master knows when to recycle the XLOG and make sure that the client won't lose sight of it.

Of course, a replication slot can also be dangerous. What if a slave disconnects and does not come back within a reasonable amount of time to consume all of the XLOG kept in stock? In this case, the consequences for the master are not good, because by the time the master fills up, it might be too late.

To use replication slots in your setup, you have to tell PostgreSQL via postgresql.conf to allow this feature, otherwise the system will error out immediately:

test=# SELECT * FROM pg_create_physical_replication_slot('my_slot');
ERROR:  replication slots can only be used if max_replication_slots > 0

In the next example, max_replication_slots is set to 10 in postgresql.conf. Keep in mind that a restart is required:

max_replication_slots = 10      # max number of replication slots
                                # (change requires restart)

Before the database is restarted, make sure that wal_level is set to at least archive to make this work. Once the database has been restarted, the replication slot can be created:

test=# SELECT * FROM pg_create_physical_replication_slot('my_slot');
 slot_name | xlog_position
-----------+---------------
 my_slot   |
(1 row)

In this scenario, a replication slot called my_slot has been created. So far, no XLOG position has been assigned to it yet.

In the next step, it is possible to check which replication slots already exist. In this example, there is only one, of course. To retrieve the entire list, just select from pg_replication_slots:

test=# \d pg_replication_slots
View "pg_catalog.pg_replication_slots"
    Column    |  Type   | Modifiers
--------------+---------+-----------
 slot_name    | name    |
 plugin       | name    |
 slot_type    | text    |
 datoid       | oid     |
 database     | name    |
 active       | boolean |
 xmin         | xid     |
 catalog_xmin | xid     |
 restart_lsn  | pg_lsn  |

The important thing here is that a physical replication slot has been created. It can be used to stream XLOG directly, so it is really about physical copies of the data.

If a replication slot is not needed any more, it can be deleted. To do so, the following instruction can be helpful:

test=# SELECT pg_drop_replication_slot('my_slot');
 pg_drop_replication_slot
--------------------------

(1 row)

It is important to clean out replication slots as soon as they are not needed any more. Otherwise, it might happen that the server producing the XLOG fills up and faces troubles because of filled-up filesystems. It is highly important to keep an eye on those replication slots and make sure that cleanup of spare slots really happens.

Logical replication slots

In contrast to physical replication slots, logical replication slots return decoded messages through a mechanism called logical decoding. The main idea is to have a means of extracting changes going on in the database directly by connecting to the XLOG. The beauty of this concept is that a plugin can be used to present data in any format desired by the administrator. The output plugin reads data through a standard API and transforms things into the desired output format. No changes to the core of PostgreSQL are required because modules can be loaded on demand.

The following example shows how a simple plugin, called test_decoding, can be utilized to dissect the XLOG:

test=# SELECT * FROM pg_create_logical_replication_slot('slot_name', 'test_decoding');
 slot_name | xlog_position
-----------+---------------
 slot_name | D/438DCEB0
(1 row)

As soon as the replication slot is created, the transaction log position is returned. This is important to know because from this position onward, the decoded XLOG will be sent to the client.

The replication slot is, of course, visible in the system view:

test=# \x
Expanded display is on.
test=# SELECT * FROM pg_replication_slots;
-[ RECORD 1 ]+--------------
slot_name    | slot_name
plugin       | test_decoding
slot_type    | logical
datoid       | 21589
database     | test
active       | f
xmin         |
catalog_xmin | 937
restart_lsn  | D/438DCE78

The replication slot can now be used. Logically, at this point, it won't return anything because nothing has happened in the database system so far:

test=# SELECT * FROM pg_logical_slot_get_changes('slot_name', NULL, NULL);
(No rows)

It is now possible to generate a table and insert some data to make use of the replication slot:

test=# CREATE TABLE t_test (id serial, name text, PRIMARY KEY (id));
CREATE TABLE

For the sake of simplicity, a table consisting of only two columns is enough. In the next step, two rows are added:

test=# INSERT INTO t_test (name) VALUES ('hans'), ('paul') RETURNING *;
 id | name
----+------
  1 | hans
  2 | paul
(2 rows)

Now it is possible to fetch data from the replication slot:

test=# SELECT * FROM pg_logical_slot_get_changes('slot_name', NULL, NULL);
  location  | xid |                             data
------------+-----+--------------------------------------------------------------
 D/438DD1C8 | 937 | BEGIN 937
 D/438FCE98 | 937 | COMMIT 937
 D/438FD030 | 938 | BEGIN 938
 D/438FD030 | 938 | table public.t_test: INSERT: id[integer]:1 name[text]:'hans'
 D/438FD138 | 938 | table public.t_test: INSERT: id[integer]:2 name[text]:'paul'
 D/438FD208 | 938 | COMMIT 938
(6 rows)

A couple of things need to be taken care of here. One of them is that DDLs (in our case, CREATE TABLE) are not decoded and cannot be replicated through replication slots. Therefore the 937 transaction is just an empty thing. The more important details can be seen in the 938 transaction. As data has been added in one transaction, data is also logically decoded in a single transaction.

The important thing is that data is only decoded once. If the function is called again, it won't return anything in this case:

test=# SELECT * FROM pg_logical_slot_get_changes('slot_name', NULL, NULL);
 location | xid | data
----------+-----+------
(0 rows)

Let's add some more data:

test=# INSERT INTO t_test (name) VALUES ('john') RETURNING *;
 id | name
----+------
  3 | john
(1 row)

There is a function called pg_logical_slot_peek_changes. It can be used to look ahead and see what is next in the queue. It does not actually dequeue the data, so it is available for later consumption:

test=# SELECT * FROM pg_logical_slot_peek_changes('slot_name', NULL, NULL);
  location  | xid |                             data
------------+-----+--------------------------------------------------------------
 D/438FD240 | 939 | BEGIN 939
 D/438FD240 | 939 | table public.t_test: INSERT: id[integer]:3 name[text]:'john'
 D/438FD310 | 939 | COMMIT 939
(3 rows)

It is possible to dequeue data multiple times. PostgreSQL also offers the opportunity to pass on a parameter to the decoding module. In this example, a timestamp is desired:

test=# SELECT * FROM pg_logical_slot_peek_changes('slot_name', NULL,
        NULL, 'include-timestamp', 'on');
  location  | xid |                             data
------------+-----+--------------------------------------------------------------
 D/438FD240 | 939 | BEGIN 939
 D/438FD240 | 939 | table public.t_test: INSERT: id[integer]:3 name[text]:'john'
 D/438FD310 | 939 | COMMIT 939 (at 2015-02-11 15:38:08.00209+01)
(3 rows)

The system will add a timestamp to the end of the commit record of the row.

Configuring replication identities

Starting with PostgreSQL 9.4, there is also some new functionality related to replication slots in ALTER TABLE. The idea is to give users control over the amount of transaction log created during the UPDATE command. In general, the goal of the transaction log is to allow the server to repair itself. If replication slots are used, it is necessary to know a bit more about the change.

Here is a simple UPDATE command:

test=# UPDATE t_test SET id = id * 10 WHERE id = 3;
UPDATE 1

Normally, UPDATE is presented by the replication slot as follows:

test=# SELECT * FROM pg_logical_slot_get_changes('slot_name', NULL, NULL);
  location  | xid |                                              data
------------+-----+-------------------------------------------------------------------------------------------------
 D/439095E8 | 940 | BEGIN 940
 D/439095E8 | 940 | table public.t_test: UPDATE: old-key: id[integer]:3 new-tuple: id[integer]:30 name[text]:'john'
 D/43909838 | 940 | COMMIT 940
(3 rows)

The new and the old keys are represented. However, somebody might want to see more. To extract more information, ALTER TABLE can be called. The REPLICA IDENTITY parameter can be set to FULL:

test=# ALTER TABLE t_test REPLICA IDENTITY FULL;
ALTER TABLE


test=# UPDATE t_test SET id = id * 10 WHERE id = 1;
UPDATE 1

The UPDATE command will now create a more verbose representation of the changed row. The content is way more beefy and makes the functioning of many client applications a bit easier:

test=# SELECT * FROM pg_logical_slot_get_changes('slot_name', NULL, NULL);
  location  | xid |                                                       data

------------+-----+--------------------------------------------------------------------------------------------------
 D/4390B818 | 944 | BEGIN 944
 D/4390B818 | 944 | table public.t_test: UPDATE: old-key: id[integer]:1 name[text]:'hans'
      new-tuple: id[integer]:10 name[text]:'hans'
 D/4390B908 | 944 | COMMIT 944
(3 rows)

The most important difference is that in the case of the name column, both versions (before and after) are included in the textual representation.

At this point, PostgreSQL provides four different levels of REPLICA IDENTITY:

  • DEFAULT: This records the old values of the columns of the primary key, if any.
  • USING INDEX: This index records the old values of the columns covered by the named index, which must be unique, not partial, and not deferrable, and must include only columns marked as NOT NULL.
  • FULL: This records the old values of all the columns in the row.
  • NOTHING: This records no information about the old row.

The default setting is usually fine. However, in some situations (especially auditing), a bit more information is definitely worth creating.

主站蜘蛛池模板: 公安县| 文化| 南川市| 绩溪县| 句容市| 武宁县| 滦南县| 六盘水市| 河津市| 赞皇县| 稻城县| 青浦区| 阿荣旗| 九龙城区| 万宁市| 托克托县| 宣汉县| 阜阳市| 河源市| 从江县| 昌黎县| 霍山县| 贵阳市| 娱乐| 藁城市| 浦东新区| 阳信县| 尼玛县| 门头沟区| 黎川县| 衡山县| 余庆县| 施甸县| 手机| 茂名市| 桐柏县| 禹城市| 临清市| 长丰县| 柳河县| 行唐县|