官术网_书友最值得收藏!

Organizational data growth

Although Hadoop allows you to add and remove new nodes dynamically for on-premise cluster setup, it is never a day-to-day task. So, when you approach sizing, you must be cognizant of data growth over the years. For example, if you are building a cluster to process social media analytics, and the organization expects to add x pages a month for processing, sizing needs to be computed accordingly. You may start computing data generation for each with the following formula:

Data Generated in Year X = Data Generated in Year (X-1) X (1 * % Growth) + Data coming from additional sources in year X. 

The following image shows a cluster sizing calculator, which can be used to compute the size of your cluster based on data growth (Excel attached). In this case, for the first year, last year's data can provide an initial size estimate:

While we work through storage sizing, it is worth pointing out another interesting difference between Hadoop and traditional storage systems, that is, Hadoop does not require RAID servers. This is because it does not add value primarily due to the underlying data replication of HDFS, scalability, and high-availability capability.

主站蜘蛛池模板: 浙江省| 大冶市| 都匀市| 垫江县| 云南省| 平度市| 长兴县| 东丽区| 越西县| 丰台区| 富宁县| 五常市| 温泉县| 惠东县| 东光县| 仁寿县| 土默特左旗| 清徐县| 滁州市| 玉门市| 大英县| 乃东县| 扎赉特旗| 龙陵县| 广南县| 清镇市| 汽车| 木兰县| 嘉定区| 广东省| 军事| 四平市| 临海市| 山东省| 汶上县| 景德镇市| 诸暨市| 康平县| 潞城市| 辰溪县| 曲阜市|