- PostgreSQL Replication(Second Edition)
- Hans Jürgen Sch?nig
- 1683字
- 2021-07-16 13:33:50
Making use of replication slots
In PostgreSQL 9.4, a major new feature called "replication slots" has been introduced. The idea is to give users and tools alike a chance to connect to the transaction log stream in a standard way and consume data.
Basically, two types of replication slots exist:
- Physical replication slots
- Logical replication slots
The following two sections describe those two types of replication slots in detail.
Physical replication slots
Physical replication slots are an important new feature of PostgreSQL 9.4. The idea is that a client can create a replication slot to make sure that the server only discards what has really made it to the client. Remember that in PostgreSQL, the transaction log is normally recycled as soon as a certain amount of new XLOG has been created. To the streaming replication slaves or some other clients, this can turn out to be disastrous, because if the slave/client cannot consume the transaction log fast enough or if there is simply not enough bandwidth, it can happen that the master throws away stuff that is actually still needed in the future. By giving clients an actual name, the master knows when to recycle the XLOG and make sure that the client won't lose sight of it.
Of course, a replication slot can also be dangerous. What if a slave disconnects and does not come back within a reasonable amount of time to consume all of the XLOG kept in stock? In this case, the consequences for the master are not good, because by the time the master fills up, it might be too late.
To use replication slots in your setup, you have to tell PostgreSQL via postgresql.conf
to allow this feature, otherwise the system will error out immediately:
test=# SELECT * FROM pg_create_physical_replication_slot('my_slot'); ERROR: replication slots can only be used if max_replication_slots > 0
In the next example, max_replication_slots
is set to 10
in postgresql.conf
. Keep in mind that a restart is required:
max_replication_slots = 10 # max number of replication slots # (change requires restart)
Before the database is restarted, make sure that wal_level
is set to at least archive to make this work. Once the database has been restarted, the replication slot can be created:
test=# SELECT * FROM pg_create_physical_replication_slot('my_slot'); slot_name | xlog_position -----------+--------------- my_slot | (1 row)
In this scenario, a replication slot called my_slot
has been created. So far, no XLOG position has been assigned to it yet.
In the next step, it is possible to check which replication slots already exist. In this example, there is only one, of course. To retrieve the entire list, just select from pg_replication_slots
:
test=# \d pg_replication_slots View "pg_catalog.pg_replication_slots" Column | Type | Modifiers --------------+---------+----------- slot_name | name | plugin | name | slot_type | text | datoid | oid | database | name | active | boolean | xmin | xid | catalog_xmin | xid | restart_lsn | pg_lsn |
The important thing here is that a physical replication slot has been created. It can be used to stream XLOG directly, so it is really about physical copies of the data.
If a replication slot is not needed any more, it can be deleted. To do so, the following instruction can be helpful:
test=# SELECT pg_drop_replication_slot('my_slot'); pg_drop_replication_slot -------------------------- (1 row)
It is important to clean out replication slots as soon as they are not needed any more. Otherwise, it might happen that the server producing the XLOG fills up and faces troubles because of filled-up filesystems. It is highly important to keep an eye on those replication slots and make sure that cleanup of spare slots really happens.
Logical replication slots
In contrast to physical replication slots, logical replication slots return decoded messages through a mechanism called logical decoding. The main idea is to have a means of extracting changes going on in the database directly by connecting to the XLOG. The beauty of this concept is that a plugin can be used to present data in any format desired by the administrator. The output plugin reads data through a standard API and transforms things into the desired output format. No changes to the core of PostgreSQL are required because modules can be loaded on demand.
The following example shows how a simple plugin, called test_decoding
, can be utilized to dissect the XLOG:
test=# SELECT * FROM pg_create_logical_replication_slot('slot_name', 'test_decoding'); slot_name | xlog_position -----------+--------------- slot_name | D/438DCEB0 (1 row)
As soon as the replication slot is created, the transaction log position is returned. This is important to know because from this position onward, the decoded XLOG will be sent to the client.
The replication slot is, of course, visible in the system view:
test=# \x Expanded display is on. test=# SELECT * FROM pg_replication_slots; -[ RECORD 1 ]+-------------- slot_name | slot_name plugin | test_decoding slot_type | logical datoid | 21589 database | test active | f xmin | catalog_xmin | 937 restart_lsn | D/438DCE78
The replication slot can now be used. Logically, at this point, it won't return anything because nothing has happened in the database system so far:
test=# SELECT * FROM pg_logical_slot_get_changes('slot_name', NULL, NULL); (No rows)
It is now possible to generate a table and insert some data to make use of the replication slot:
test=# CREATE TABLE t_test (id serial, name text, PRIMARY KEY (id)); CREATE TABLE
For the sake of simplicity, a table consisting of only two columns is enough. In the next step, two rows are added:
test=# INSERT INTO t_test (name) VALUES ('hans'), ('paul') RETURNING *; id | name ----+------ 1 | hans 2 | paul (2 rows)
Now it is possible to fetch data from the replication slot:
test=# SELECT * FROM pg_logical_slot_get_changes('slot_name', NULL, NULL); location | xid | data ------------+-----+-------------------------------------------------------------- D/438DD1C8 | 937 | BEGIN 937 D/438FCE98 | 937 | COMMIT 937 D/438FD030 | 938 | BEGIN 938 D/438FD030 | 938 | table public.t_test: INSERT: id[integer]:1 name[text]:'hans' D/438FD138 | 938 | table public.t_test: INSERT: id[integer]:2 name[text]:'paul' D/438FD208 | 938 | COMMIT 938 (6 rows)
A couple of things need to be taken care of here. One of them is that DDLs (in our case, CREATE TABLE
) are not decoded and cannot be replicated through replication slots. Therefore the 937
transaction is just an empty thing. The more important details can be seen in the 938
transaction. As data has been added in one transaction, data is also logically decoded in a single transaction.
The important thing is that data is only decoded once. If the function is called again, it won't return anything in this case:
test=# SELECT * FROM pg_logical_slot_get_changes('slot_name', NULL, NULL); location | xid | data ----------+-----+------ (0 rows)
Let's add some more data:
test=# INSERT INTO t_test (name) VALUES ('john') RETURNING *; id | name ----+------ 3 | john (1 row)
There is a function called pg_logical_slot_peek_changes
. It can be used to look ahead and see what is next in the queue. It does not actually dequeue the data, so it is available for later consumption:
test=# SELECT * FROM pg_logical_slot_peek_changes('slot_name', NULL, NULL); location | xid | data ------------+-----+-------------------------------------------------------------- D/438FD240 | 939 | BEGIN 939 D/438FD240 | 939 | table public.t_test: INSERT: id[integer]:3 name[text]:'john' D/438FD310 | 939 | COMMIT 939 (3 rows)
It is possible to dequeue data multiple times. PostgreSQL also offers the opportunity to pass on a parameter to the decoding module. In this example, a timestamp is desired:
test=# SELECT * FROM pg_logical_slot_peek_changes('slot_name', NULL, NULL, 'include-timestamp', 'on'); location | xid | data ------------+-----+-------------------------------------------------------------- D/438FD240 | 939 | BEGIN 939 D/438FD240 | 939 | table public.t_test: INSERT: id[integer]:3 name[text]:'john' D/438FD310 | 939 | COMMIT 939 (at 2015-02-11 15:38:08.00209+01) (3 rows)
The system will add a timestamp to the end of the commit record of the row.
Configuring replication identities
Starting with PostgreSQL 9.4, there is also some new functionality related to replication slots in ALTER TABLE
. The idea is to give users control over the amount of transaction log created during the UPDATE
command. In general, the goal of the transaction log is to allow the server to repair itself. If replication slots are used, it is necessary to know a bit more about the change.
Here is a simple UPDATE
command:
test=# UPDATE t_test SET id = id * 10 WHERE id = 3; UPDATE 1
Normally, UPDATE
is presented by the replication slot as follows:
test=# SELECT * FROM pg_logical_slot_get_changes('slot_name', NULL, NULL); location | xid | data ------------+-----+------------------------------------------------------------------------------------------------- D/439095E8 | 940 | BEGIN 940 D/439095E8 | 940 | table public.t_test: UPDATE: old-key: id[integer]:3 new-tuple: id[integer]:30 name[text]:'john' D/43909838 | 940 | COMMIT 940 (3 rows)
The new and the old keys are represented. However, somebody might want to see more. To extract more information, ALTER TABLE
can be called. The REPLICA IDENTITY
parameter can be set to FULL
:
test=# ALTER TABLE t_test REPLICA IDENTITY FULL; ALTER TABLE test=# UPDATE t_test SET id = id * 10 WHERE id = 1; UPDATE 1
The UPDATE
command will now create a more verbose representation of the changed row. The content is way more beefy and makes the functioning of many client applications a bit easier:
test=# SELECT * FROM pg_logical_slot_get_changes('slot_name', NULL, NULL); location | xid | data ------------+-----+-------------------------------------------------------------------------------------------------- D/4390B818 | 944 | BEGIN 944 D/4390B818 | 944 | table public.t_test: UPDATE: old-key: id[integer]:1 name[text]:'hans' new-tuple: id[integer]:10 name[text]:'hans' D/4390B908 | 944 | COMMIT 944 (3 rows)
The most important difference is that in the case of the name
column, both versions (before and after) are included in the textual representation.
At this point, PostgreSQL provides four different levels of REPLICA IDENTITY
:
DEFAULT
: This records the old values of the columns of the primary key, if any.USING INDEX
: This index records the old values of the columns covered by the named index, which must be unique, not partial, and not deferrable, and must include only columns marked asNOT NULL
.FULL
: This records the old values of all the columns in the row.NOTHING
: This records no information about the old row.
The default setting is usually fine. However, in some situations (especially auditing), a bit more information is definitely worth creating.
- 國際大學生程序設計競賽中山大學內部選拔真題解(二)
- Java EE框架整合開發入門到實戰:Spring+Spring MVC+MyBatis(微課版)
- 算法基礎:打開程序設計之門
- R的極客理想:工具篇
- Getting Started with Hazelcast(Second Edition)
- Learning Laravel's Eloquent
- Mastering openFrameworks:Creative Coding Demystified
- Python深度學習原理、算法與案例
- 深入剖析Java虛擬機:源碼剖析與實例詳解(基礎卷)
- Unity&VR游戲美術設計實戰
- 軟件工程與UML案例解析(第三版)
- 高質量程序設計指南:C++/C語言
- Practical Linux Security Cookbook
- 軟件自動化測試實戰解析:基于Python3編程語言
- Visual C++網絡編程教程(Visual Studio 2010平臺)