官术网_书友最值得收藏!

The cluster structure

The size and structure of your big data cluster is going to affect performance. If you have a cloud-based cluster, your IO and latency will suffer in comparison to an unshared hardware cluster. You will be sharing the underlying hardware with multiple customers and the cluster hardware may be remote. There are some exceptions to this. The IBM cloud, for instance, offers dedicated bare metal high performance cluster nodes with an InfiniBand network connection, which can be rented on an hourly basis.

Additionally, the positioning of cluster components on servers may cause resource contention. For instance, think carefully about locating Hadoop NameNodes, Spark servers, Zookeeper, Flume, and Kafka servers in large clusters. With high workloads, you might consider segregating servers to individual systems. You might also consider using an Apache system such as Mesos that provides better distributions and assignment of resources to the individual processes.

Consider potential parallelism as well. The greater the number of workers in your Spark cluster for large Datasets, the greater the opportunity for parallelism. One rule of thumb is one worker per hyper-thread or virtual core respectively.

主站蜘蛛池模板: 义马市| 巩留县| 弥渡县| 衢州市| 新乡市| 张家界市| 惠州市| 云龙县| 合江县| 安阳县| 新建县| 峡江县| 横峰县| 顺昌县| 苏尼特左旗| 绥棱县| 文山县| 宝应县| 衡南县| 河北区| 陆丰市| 保亭| 偃师市| 南郑县| 灌南县| 祁门县| 象州县| 合阳县| 读书| 宝山区| 合山市| 绵阳市| 凯里市| 广州市| 儋州市| 泰和县| 福建省| 长宁区| 汉沽区| SHOW| 辽宁省|