官术网_书友最值得收藏!

Chapter 4. Parallel Collections and Futures

Data science often involves processing medium or large amounts of data. Since the previously exponential growth in the speed of individual CPUs has slowed down and the amount of data continues to increase, leveraging computers effectively must entail parallel computation.

In this chapter, we will look at ways of parallelizing computation and data processing over a single computer. Virtually all new computers have more than one processing unit, and distributing a calculation over these cores can be an effective way of hastening medium-sized calculations.

Parallelizing calculations over a single chip is suitable for calculations involving gigabytes or a few terabytes of data. For larger data flows, we must resort to distributing the computation over several computers in parallel. We will discuss Apache Spark, a framework for parallel data processing in Chapter 10, Distributed Batch Processing with Spark.

In this book, we will look at three common ways of leveraging parallel architectures in a single machine: parallel collections, futures, and actors. We will consider the first two in this chapter, and leave the study of actors to Chapter 9, Concurrency with Akka.

主站蜘蛛池模板: 托克逊县| 铜陵市| 留坝县| 杭锦后旗| 定西市| 隆尧县| 伊宁县| 长沙市| 侯马市| 阜新| 登封市| 石台县| 化德县| 治多县| 天柱县| 扬中市| 琼海市| 镇康县| 曲阳县| 黑龙江省| 津市市| 吴旗县| 华坪县| 高密市| 宁安市| 孝义市| 灵寿县| 资中县| 柳江县| 祥云县| 城市| 景谷| 双流县| 恩施市| 油尖旺区| 策勒县| 开平市| 西城区| 类乌齐县| 灵璧县| 云林县|