- Mastering Apache Spark 2.x(Second Edition)
- Romeo Kienzler
- 186字
- 2021-07-02 18:55:26
The cluster structure
The size and structure of your big data cluster is going to affect performance. If you have a cloud-based cluster, your IO and latency will suffer in comparison to an unshared hardware cluster. You will be sharing the underlying hardware with multiple customers and the cluster hardware may be remote. There are some exceptions to this. The IBM cloud, for instance, offers dedicated bare metal high performance cluster nodes with an InfiniBand network connection, which can be rented on an hourly basis.
Additionally, the positioning of cluster components on servers may cause resource contention. For instance, think carefully about locating Hadoop NameNodes, Spark servers, Zookeeper, Flume, and Kafka servers in large clusters. With high workloads, you might consider segregating servers to individual systems. You might also consider using an Apache system such as Mesos that provides better distributions and assignment of resources to the individual processes.
Consider potential parallelism as well. The greater the number of workers in your Spark cluster for large Datasets, the greater the opportunity for parallelism. One rule of thumb is one worker per hyper-thread or virtual core respectively.
- 大學計算機基礎(第三版)
- OpenDaylight Cookbook
- DevOps for Networking
- MATLAB圖像處理超級學習手冊
- Instant Zepto.js
- Vue.js 3.x從入門到精通(視頻教學版)
- Xcode 7 Essentials(Second Edition)
- Apache Hive Essentials
- 機器人Python青少年編程開發實例
- Java Web程序設計
- HTML5+CSS3網站設計基礎教程
- 高級語言程序設計(C語言版):基于計算思維能力培養
- Oracle 18c 必須掌握的新特性:管理與實戰
- Nginx Lua開發實戰
- Spring MVC+MyBatis開發從入門到項目實踐(超值版)