- PostgreSQL Replication(Second Edition)
- Hans Jürgen Sch?nig
- 1969字
- 2021-07-16 13:33:50
Tuning checkpoints and the XLOG
So far, this chapter has hopefully provided some insights into how PostgreSQL writes data and what the XLOG is used for in general. Given this knowledge, we can now move on and see what we can do to make our databases work even more efficiently, both in the case of replication and in the case of running just a single server.
Understanding the checkpoints
In this chapter, we have seen that data has to be written to the XLOG before it can go anywhere. The thing is that if the XLOG was never deleted, clearly, we would not write to it forever without filling up the disk at some point in time.
To solve this problem, the XLOG has to be deleted at some point. This process is called checkpointing.
The main question arising from this issue is, "When can the XLOG be truncated up to a certain point?" The answer is, "When PostgreSQL has put everything that is already in the XLOG into the storage files." If all the changes made to the XLOG are also made to the data files, the XLOG can be truncated.
Tip
Keep in mind that simply writing the data is worthless. We also have to flush the data to the data tables.
In a way, the XLOG can be seen as the repairman of the data files if something undesirable happens. If everything is fully repaired, the repair instructions can be removed safely; this is exactly what happens during a checkpoint.
Configuring the checkpoints
Checkpoints are highly important for consistency, but they are also highly relevant to performance. If checkpoints are configured poorly, you might face serious performance degradations.
When it comes to configuring checkpoints, the following parameters are relevant. Note that all of these parameters can be changed in postgresql.conf
:
checkpoint_segments = 3 checkpoint_timeout = 5min checkpoint_completion_target = 0.5 checkpoint_warning = 30s
In the following sections, we will take a look at each of these variables.
Segments and timeouts
The checkpoint_segments
and checkpoint_timeout
parameters define the distance between two checkpoints. A checkpoint happens either when we run out of segments or when the timeout happens.
Remember that a segment is usually 16 MB, so three segments means that we will perform a checkpoint every 48 MB. On modern hardware, 16 MB is far from enough. In a typical production system, a checkpoint interval of 256 MB (measured in segments) or even higher is perfectly feasible.
However, when setting checkpoint_segments
, one thing has to be present at the back of your mind: in the event of a crash, PostgreSQL has to replay all the changes since the last checkpoint. If the distance between two checkpoints is unusually large, you might notice that your failed database instance takes too long to start up again. This should be avoided for the sake of availability.
Tip
There will always be a trade-off between performance and recovery times after a crash. You have to balance your configuration accordingly.
The checkpoint_timeout
parameter is also highly relevant. It is the upper limit of the time allowed between two checkpoints. There is no point in increasing checkpoint_segments
infinitely while leaving the time as it is. On large systems, increasing checkpoint_timeout
has proven to make sense for many people.
Note
In PostgreSQL, you will figure out that there are a constant number of transaction log files around. Unlike other database systems, the number of XLOG files has nothing to do with the maximum size of a transaction; a transaction can easily be much larger than the distance between two checkpoints.
To write or not to write?
You learned in this chapter that at the COMMIT
time, we cannot be sure whether the data is already in the data files or not.
So if the data files don't have to be consistent anyway, why not vary the point in time at which the data is written? This is exactly what we can do with the checkpoint_completion
target. The idea is to have a setting that specifies the target of checkpoint completion as a fraction of the total time between two checkpoints.
Let's now discuss three scenarios to illustrate the purpose of the checkpoint_completion_target
parameter.
Scenario 1 – storing stock market data
In scenario 1, we want to store the most recent stock quotes of all stocks in the Dow Jones Industrial Average (DJIA). We don't want to store the history of all stock prices but only the most recent, current price.
Given the type of data we are dealing with, we can assume that we will have a workload that is dictated by UPDATE
statements.
What will happen now? PostgreSQL has to update the same data over and over again. Given the fact that the DJIA consists of only 30 different stocks, the amount of data is very limited and our table will be really small. In addition to this, the price might be updated every second or even more often.
Internally, the situation is like this: when the first UPDATE
command comes along, PostgreSQL will grab a block, put it into the memory, and modify it. Every subsequent UPDATE
command will most likely change the same block. Logically, all writes have to go to the transaction log, but what happens with the cached blocks in the shared buffer?
The general rule is as follows: if there are many UPDATE
commands (and as a result, changes made to the same block), it is wise to keep the blocks in memory as long as possible. This will greatly increase the odds of avoiding I/O by writing multiple changes in one go.
Tip
If you want to increase the odds of having many changes in one disk I/O, consider decreasing checkpoint_complection_target
. The blocks will stay in the memory longer, and therefore many changes might go into the same block before a write occurs.
In the scenario just outlined, a checkpoint_completion_target
variable having value of 0.05 (or 5 percent) might be reasonable.
Scenario 2 – bulk loading
In our second scenario, we will load 1 TB of data into an empty table. If you are loading so much data at once, what are the odds of hitting a block you have hit 10 minutes ago again? The odds are basically zero. There is no point in buffering writes in this case, because we would simply miss the disk capacity lost by idling and waiting for I/O to take place.
During a bulk load, we would want to use all of the I/O capacity we have all the time. To make sure that PostgreSQL writes data instantly, we have to increase checkpoint_completion_target
to a value close to 1.
Scenario 3 – I/O spikes and throughput considerations
Sharp spikes can kill you; at least, they can do serious harm, which should be avoided. What is true in the real world around you is always true in the database world as well.
In this scenario, we want to assume an application storing the so-called Call Detail Records (CDRs) for a phone company. You can imagine that a lot of writing will happen and people will be placing phone calls all day long. Of course, there will be people placing a phone call that is instantly followed by the next call, but we will also witness a great number of people placing just one call a week or so.
Technically, this means that there is a good chance that a block in the shared memory that has recently been changed will face a second or a third change soon, but we will also have a great deal of changes made to blocks that will never be visited again.
How will we handle this? Well, it is a good idea to write data late so that as many changes as possible will go to pages that have been modified before. But what will happen during a checkpoint? If changes (in this case, dirty pages) have been held back for too long, the checkpoint itself will be intense, and many blocks will have to be written within a fairly short period of time. This can lead to an I/O spike. During an I/O spike, you will see that your I/O system is busy. It might show poor response times, and those poor response times can be felt by your end user.
This adds a new dimension to the problem—predictable response times.
Let's put it in this way: assume you have used internet banking successfully for quite a while. You are happy. Now some guy at your bank has found a tweak that makes the database behind the internet banking 50 percent faster, but this gain comes with a downside: for two hours a day, the system will not be reachable. Clearly, from a performance point of view, the throughput will be better, as shown in this inequation:
24 hours * 1 X < 22 hours * 1.5 X
But are you, the customer, going to be happy? Clearly, you would not be. This is a typical use case where optimizing for maximum throughput does no good. If you can meet your performance requirements, it might be wiser to have evenly distributed response times at the cost of a small performance penalty. In our banking example, this would mean that your system is up 24x7 instead of just 22 hours a day.
Tip
Would you pay your mortgage more frequently if your internet banking was 10 times faster? Clearly, you would not. Sometimes, it is not about optimizing for many transactions per second but to optimize in a way that you can handle a predefined amount of load in the most reasonable way.
The same concept applies to the phone application we have outlined. We are not able to write all changes during the checkpoint anymore, because this might cause latency issues during a checkpoint. It is also no good to make a change to the data files more or less instantly (which means a high checkpoint_completion_target
value), because we would be writing too much, too often.
This is a classic example where you have got to compromise. A checkpoint_completion_target
value of 0.5
might be the best idea in this case.
Conclusion
The conclusion that should be drawn from these three examples is that no configuration fits all purposes. You really have to think about the type of data you are dealing with in order to come up with a good and feasible configuration. For many applications, a value of 0.5
has been proven to be just fine.
Tweaking WAL buffers
In this chapter, we have already adjusted some major parameters, such as shared_buffers
, fsync
, and so on. There is one more parameter, however, that can have a dramatic impact on performance. The wal_buffers
parameter has been designed to tell PostgreSQL how much memory to keep around to remember the XLOG that has not been written to the disk so far. So, if somebody pumps in a large transaction, PostgreSQL will not write any "mini" change to the table to the XLOG before COMMIT
. Remember that if a non-committed transaction is lost during a crash, we won't care about it anyway because COMMIT
is the only thing that really counts in everyday life. It makes perfect sense to write XLOG in larger chunks before COMMIT
happens. This is exactly what wal_buffers
does. Unless changed manually in postgresql.conf
, it is an auto-tuned parameter (represented by -1
) that makes PostgreSQL take three percent of shared_buffers
, but no more than 16 MB to keep the XLOG around before writing it to the disk.
Tip
In older versions of PostgreSQL, the wal_buffers
parameter was at 64 kB. That was unreasonably low for modern machines. If you are running an old version, consider increasing wal_buffers
to 16 MB. This is usually a good value for reasonably sized database instances.
- DBA攻堅指南:左手Oracle,右手MySQL
- C語言程序設計教程
- SoapUI Cookbook
- The Android Game Developer's Handbook
- 兩周自制腳本語言
- Building a Game with Unity and Blender
- AngularJS深度剖析與最佳實踐
- Blender 3D Incredible Machines
- 微信公眾平臺開發:從零基礎到ThinkPHP5高性能框架實踐
- 微信小程序入門指南
- Getting Started with Laravel 4
- JavaScript動態網頁編程
- Python計算機視覺和自然語言處理
- Android移動應用開發項目教程
- 深入解析Java編譯器:源碼剖析與實例詳解