- PostgreSQL 9 High Availability Cookbook
- Shaun M. Thomas
- 1283字
- 2021-12-08 12:31:58
Having enough IOPS
IOPS stands for Input/Output Operations Per Second. Essentially, this describes how many operations a device can perform per second before it should be considered saturated. If a device is saturated, further requests must wait until the device has a spare bandwidth. A server overwhelmed with requests can amount to seconds, minutes, or even hours of delayed results.
Depending on application timeout settings and user patience, a device with low IOPS appears as a bottleneck that reduces both system responsiveness and the perception of quality. A database with insufficient IOPS to service queries in a timely manner is unavailable for all intents and purposes. It doesn't matter if PostgreSQL is still available and serving results in this scenario, as its availability has already suffered. We are trying to build a highly available database, and to do so, we need to build a server with enough performance to survive daily operation. In addition, we must overprovision for unexpected surges in popularity, and account for future storage and throughput needs based on monthly increases in storage utilization.
Getting ready
This process is more of a thought experiment. We will present some very rough estimates of IO performance for many different disk types. For each, we should increment entries in our hardware spreadsheet based on perceived need.
The main things we will need for this process are numbers. During development, applications commonly have a goal, expected client count, table count, estimated growth rates, and so on. Even if we have to guess for many of these, they will all contribute to our IOPS requirements. Have these numbers ready, even if they're simply guesses.
Tip
If the application already exists on a development or stage environment, try to get the development or QA team to run operational tests. This is a great opportunity to gather statistics before choosing potential production hardware.
How to do it...
We need to figure out how many operations per second we can expect. We can estimate this by using the following steps:
- Collect the amount of simultaneous database connections. Start with the expected user count, and divide by 50.
- Obtain the average number of queries per page. If this is unavailable, use ten.
- Count the amount of tables used in those queries. If this is unavailable, use three.
- Multiply these numbers together, then double it. Then multiply the total by eight.
- Increment the
Count
column in our hardware spreadsheet for one or more of the following, and round up:- For 3.5" hard drives, divide by 500
- For 2.5" hard drives, divide by 350
- For SSD hard drives, divide by 25000, then add two
- Add 10 percent to any count greater than 0 and then round up.
How it works...
Wow, that's a lot of work! There's a reason for everything, of course.
In the initial three steps, we're trying to figure out how many operations might touch an object on disk. For every user that's actively loading a page, for every query in that page, and for every table in that query, that's a potential disk read or write.
We double that number to account for the fact we're estimating all of this. It's a common engineering trick to double or triple calculations to absorb unexpected capacity, variance in materials, and so on. We can use that same technique here.
Note
Why did we suggest dividing the user count by 50 to get the connection total? Since we do not know the average query runtime, we assume 20 ms for each query. For every query that's executing, a connection is in use. Assuming full utilization, up to 50 queries can be active per second. If you have a production system that can provide a better query runtime average, we suggest using that value instead.
However, why do we then multiply by eight? In a worst (or best) case scenario, it's not uncommon for an application to double the amount of users or requests on a yearly basis. Doubled usage means doubled hardware needs. If requirements double in one year, we would need a server three times more powerful (1 + 2) than the original estimates. Another doubling would mean a server seven times better (1 + 2 + 4). CPUs, RAM, and storage are generally available as powers of two. Since it's fairly difficult to obtain storage seven times faster than what we already have, we multiply the total by eight.
That gives a total IOPS value roughly necessary for our database to immediately serve every request for the next three years, straight from the disk device. Several companies buy servers every three or four years as a balance between cost and capacity, so these estimates are based on that assumption.
In the next step, we get a rough estimate to the amount of disks necessary to serve the necessary IOPS. Our numbers in these steps are based on hard drive performance. A 15,000 RPM hard drive can serve under ideal conditions, 500 operations per second. Likewise, a 10,000 RPM can provide roughly 350 operations per second. Current SSDs as of this writing commonly reach 100,000 IOPS. However, because they are so fast, we need far fewer of them, and thus risk is not as evenly distributed. We artificially increase the amount of these drives because, again, we are erring toward availability.
Finally, we add a few extra devices for spares that will go in a closet somewhere, just in case one or more drives fail. This also insulates us from the rare event that hardware is discontinued or otherwise difficult to obtain.
There's more...
Figuring out the number of IOPS we need and the devices involved is only part of the story.
A working example
Sometimes these large lists of calculations make more sense if we see them in practice. So let's make the assumption that 2,000 users will use our application each second. This is how that would look:
- 2000 / 50 = 40
- Default queries per page = 10
- Default tables per query = 3
- 40 * 10 * 3 * 2 = 2400
- 2400 * 8 = 19200
- 19200 IOPS in drives:
- 3.5" drives: 19200 / 500 = 38.4 ~ 39
- 2.5" drives: 19200 / 350 = 54.9 ~ 55
- SSDs: 2 + (19200 / 25000) = 2.8 ~ 3
- Add 10 percent.
- 3.5" drives: 39 + 3.9 = 42.9 ~ 43
- 2.5" drives: 55 + 5.5 = 60.5 ~ 61
- SSDs: 3 + 0.3 = 3.3 ~ 4
We are not taking space into account either, which would also increase our SSD count. We will be discussing capacity soon.
Making concessions
Our calculations always assume worst case scenarios. This is both expensive and in many cases, overzealous. We ignore RAM caching of disk blocks, we don't account for application frontend caches, and the PostgreSQL shared buffers are also not included.
Why? Crashes are always a concern. If a database crashes, buffers are forfeit. If the application frontend cache gets emptied or has problems, reads will be served directly from the database. Until caches are rebuilt, query results can be multiple orders of magnitude slower than normal for minutes or hours. We will discuss methods of circumventing these effects, but these IOPS numbers give us a baseline.
The number of necessary IOPS, and hence disk requirements, are subject to risk evaluation and cost benefit analysis. Deciding between 100 percent coverage and an acceptable fraction is a careful balancing act. Feel free to reduce these numbers; just consider the cost of an outage as part of the total. If a delay is considered standard operating procedures, fractions up to 50 percent are relatively low risk. If possible, try to run tests for an ultimate decision before purchase.
- LabVIEW2018中文版 虛擬儀器程序設計自學手冊
- 深入淺出WPF
- Data Analysis with IBM SPSS Statistics
- Python漫游數學王國:高等數學、線性代數、數理統計及運籌學
- D3.js 4.x Data Visualization(Third Edition)
- Learning Probabilistic Graphical Models in R
- Creating Mobile Apps with jQuery Mobile(Second Edition)
- Python全棧數據工程師養成攻略(視頻講解版)
- CoffeeScript Application Development Cookbook
- JSP程序設計與案例實戰(慕課版)
- 會當凌絕頂:Java開發修行實錄
- 零基礎學Java(第5版)
- Java面向對象程序設計教程
- Python實戰指南:手把手教你掌握300個精彩案例
- 開源網絡地圖可視化:基于Leaflet的在線地圖開發