官术网_书友最值得收藏!

Failure domains

If your cluster will have less than 10 nodes, this is probably the most important point.

With legacy scale-up storage, the hardware is expected to be 100% reliable. All components are redundant, and the failure of a complete component such as a system board or disk JBOD would likely cause an outage. Therefore, there is no real knowledge of how such a failure might impact the operation of the system, just the hope that it doesn't happen! With Ceph, there is an underlying assumption that complete failure of a section of your infrastructure, be that a disk, node, or even rack should be considered as normal and should not make your cluster unavailable.

Let's take two Ceph clusters both comprising 240 disks. Cluster A comprises 20x12 disk nodes; Cluster B comprises 4x60 disk nodes. Now, let's take a scenario where for whatever reason a Ceph OSD node goes offline. It could be due to planned maintenance or unexpected failure, but that node is now down and any data on it is unavailable. Ceph is designed to mask this situation and will even recover from it whilst maintaining full data access.

In the case of cluster A, we have now lost 5% of our disks and in the event of a permanent loss would have to reconstruct 72 TB of data. Cluster B has lost 25% of its disks and would have to reconstruct 360 TB. The latter would severely impact the performance of the cluster, and in the case of data reconstruction, this period of degraded performance could last for many days.

It's clear that on smaller sized clusters, these very large dense nodes are not a good idea. A 10 Ceph node cluster is probably the minimum size if you want to reduce the impact of node failure, and so in the case of 60 drive JBODs, you would need a cluster that at minimum is measured in petabytes.

主站蜘蛛池模板: 九江市| 新密市| 铜陵市| 阳泉市| 黄浦区| 清河县| 宕昌县| 钦州市| 泰顺县| 丘北县| 饶河县| 雷山县| 昭苏县| 治县。| 井研县| 蓝田县| 龙江县| 平罗县| 广东省| 金平| 渑池县| 五寨县| 天峻县| 乡城县| 海兴县| 亚东县| 平阳县| 安陆市| 集贤县| 深水埗区| 石阡县| 山阳县| 全南县| 陕西省| 土默特右旗| 崇信县| 肥西县| 崇阳县| 余庆县| 通海县| 微博|