- Scala for Data Science
- Pascal Bugnion
- 189字
- 2021-07-23 14:33:07
Chapter 4. Parallel Collections and Futures
Data science often involves processing medium or large amounts of data. Since the previously exponential growth in the speed of individual CPUs has slowed down and the amount of data continues to increase, leveraging computers effectively must entail parallel computation.
In this chapter, we will look at ways of parallelizing computation and data processing over a single computer. Virtually all new computers have more than one processing unit, and distributing a calculation over these cores can be an effective way of hastening medium-sized calculations.
Parallelizing calculations over a single chip is suitable for calculations involving gigabytes or a few terabytes of data. For larger data flows, we must resort to distributing the computation over several computers in parallel. We will discuss Apache Spark, a framework for parallel data processing in Chapter 10, Distributed Batch Processing with Spark.
In this book, we will look at three common ways of leveraging parallel architectures in a single machine: parallel collections, futures, and actors. We will consider the first two in this chapter, and leave the study of actors to Chapter 9, Concurrency with Akka.
- Java程序設計與開發
- 算法基礎:打開程序設計之門
- Python程序設計(第3版)
- Vue.js 3.0源碼解析(微課視頻版)
- 數據結構(Python語言描述)(第2版)
- Mastering LibGDX Game Development
- 深度學習:算法入門與Keras編程實踐
- HTML5+CSS3 Web前端開發技術(第2版)
- 深入剖析Java虛擬機:源碼剖析與實例詳解(基礎卷)
- Everyday Data Structures
- Mastering jQuery Mobile
- Scratch從入門到精通
- Yii2 By Example
- Manage Your SAP Projects with SAP Activate
- Python繪圖指南:分形與數據可視化(全彩)