- Apache Hadoop 3 Quick Start Guide
- Hrishikesh Vijay Karambelkar
- 180字
- 2021-06-10 19:18:45
Velocity of data and other factors
The velocity of data generated and transferred to the Hadoop cluster also impacts cluster sizing. Take two scenarios of data population, such as data generated in GBs per minute, as shown in the following diagram:

In the preceding diagram, both scenarios have generated the same data each day, but with a different velocity. In the first scenario, there are spikes of data, whereas the second sees a consistent flow of data. In scenario 1, you will need more hardware with additional CPUs or GPUs and storage over scenario 2. There are many other influencing parameters that can impact the sizing of the cluster; for example, the type of data can influence the compression factor of your cluster. Compression can be achieved with gzip, bzip, and other compression utilities. If the data is textual, the compression is usually higher. Similarly, intermediate storage requirements also add up to an additional 25% to 35%. Intermediate storage is used by MapReduce tasks to store intermediate results of processing. You can access an example Hadoop sizing calculator here.
- 32位嵌入式系統與SoC設計導論
- Hadoop 2.x Administration Cookbook
- MCSA Windows Server 2016 Certification Guide:Exam 70-741
- JSF2和RichFaces4使用指南
- Java Web整合開發全程指南
- 零起點學西門子S7-200 PLC
- Linux嵌入式系統開發
- Visual FoxPro程序設計
- Building Google Cloud Platform Solutions
- 人工智能:智能人機交互
- 步步驚“芯”
- Apache Spark Quick Start Guide
- Eclipse全程指南
- ASP.NET學習手冊
- PVCBOT零基礎機器人制作(第2版)