官术网_书友最值得收藏!

Organizational data growth

Although Hadoop allows you to add and remove new nodes dynamically for on-premise cluster setup, it is never a day-to-day task. So, when you approach sizing, you must be cognizant of data growth over the years. For example, if you are building a cluster to process social media analytics, and the organization expects to add x pages a month for processing, sizing needs to be computed accordingly. You may start computing data generation for each with the following formula:

Data Generated in Year X = Data Generated in Year (X-1) X (1 * % Growth) + Data coming from additional sources in year X. 

The following image shows a cluster sizing calculator, which can be used to compute the size of your cluster based on data growth (Excel attached). In this case, for the first year, last year's data can provide an initial size estimate:

While we work through storage sizing, it is worth pointing out another interesting difference between Hadoop and traditional storage systems, that is, Hadoop does not require RAID servers. This is because it does not add value primarily due to the underlying data replication of HDFS, scalability, and high-availability capability.

主站蜘蛛池模板: 保靖县| 彭阳县| 崇左市| 阜新市| 于都县| 枣阳市| 武隆县| 江孜县| 沅陵县| 大足县| 云阳县| 巫溪县| 绩溪县| 张家港市| 江阴市| 通化县| 梅河口市| 梅河口市| 四子王旗| 陈巴尔虎旗| 怀远县| 天津市| 伽师县| 凉城县| 资中县| 翼城县| 禄丰县| 含山县| 元朗区| 建德市| 阿勒泰市| 三江| 东平县| 夏河县| 扬州市| 南丹县| 治多县| 建湖县| 泽州县| 高密市| 北海市|