官术网_书友最值得收藏!

Chapter 4. Parallel Collections and Futures

Data science often involves processing medium or large amounts of data. Since the previously exponential growth in the speed of individual CPUs has slowed down and the amount of data continues to increase, leveraging computers effectively must entail parallel computation.

In this chapter, we will look at ways of parallelizing computation and data processing over a single computer. Virtually all new computers have more than one processing unit, and distributing a calculation over these cores can be an effective way of hastening medium-sized calculations.

Parallelizing calculations over a single chip is suitable for calculations involving gigabytes or a few terabytes of data. For larger data flows, we must resort to distributing the computation over several computers in parallel. We will discuss Apache Spark, a framework for parallel data processing in Chapter 10, Distributed Batch Processing with Spark.

In this book, we will look at three common ways of leveraging parallel architectures in a single machine: parallel collections, futures, and actors. We will consider the first two in this chapter, and leave the study of actors to Chapter 9, Concurrency with Akka.

主站蜘蛛池模板: 静宁县| 宜黄县| 兴化市| 泸水县| 北京市| 高邮市| 麻栗坡县| 长阳| 专栏| 安西县| 昂仁县| 昂仁县| 基隆市| 陵水| 迭部县| 阿克| 菏泽市| 邹城市| 雷州市| 上高县| 汾阳市| 定襄县| 颍上县| 阿坝| 那坡县| 巢湖市| 巴马| 砀山县| 横山县| 新干县| 屏东县| 阿合奇县| 平南县| 保德县| 河源市| 海原县| 即墨市| 安塞县| 扎兰屯市| 那坡县| 临夏市|