官术网_书友最值得收藏!

Velocity of data and other factors

The velocity of data generated and transferred to the Hadoop cluster also impacts cluster sizing. Take two scenarios of data population, such as data generated in GBs per minute, as shown in the following diagram:

In the preceding diagram, both scenarios have generated the same data each day, but with a different velocity. In the first scenario, there are spikes of data, whereas the second sees a consistent flow of data. In scenario 1, you will need more hardware with additional CPUs or GPUs and storage over scenario 2. There are many other influencing parameters that can impact the sizing of the cluster; for example, the type of data can influence the compression factor of your cluster. Compression can be achieved with gzip, bzip, and other compression utilities. If the data is textual, the compression is usually higher. Similarly, intermediate storage requirements also add up to an additional 25% to 35%. Intermediate storage is used by MapReduce tasks to store intermediate results of processing. You can access an example Hadoop sizing calculator here.

主站蜘蛛池模板: 鹤庆县| 若羌县| 田阳县| 沙田区| 滦平县| 龙江县| 西峡县| 高台县| 太原市| 乌鲁木齐市| 盘山县| 边坝县| 焦作市| 家居| 普定县| 吴江市| 贡觉县| 海兴县| 黎城县| 团风县| 镇平县| 芦山县| 禄丰县| 塘沽区| 宁阳县| 军事| 兴和县| 海盐县| 柳州市| 望城县| 利辛县| 双辽市| 杭锦后旗| 富顺县| 威海市| 襄汾县| 彭州市| 偃师市| 汪清县| 于田县| 香河县|