- Scala for Data Science
- Pascal Bugnion
- 189字
- 2021-07-23 14:33:07
Chapter 4. Parallel Collections and Futures
Data science often involves processing medium or large amounts of data. Since the previously exponential growth in the speed of individual CPUs has slowed down and the amount of data continues to increase, leveraging computers effectively must entail parallel computation.
In this chapter, we will look at ways of parallelizing computation and data processing over a single computer. Virtually all new computers have more than one processing unit, and distributing a calculation over these cores can be an effective way of hastening medium-sized calculations.
Parallelizing calculations over a single chip is suitable for calculations involving gigabytes or a few terabytes of data. For larger data flows, we must resort to distributing the computation over several computers in parallel. We will discuss Apache Spark, a framework for parallel data processing in Chapter 10, Distributed Batch Processing with Spark.
In this book, we will look at three common ways of leveraging parallel architectures in a single machine: parallel collections, futures, and actors. We will consider the first two in this chapter, and leave the study of actors to Chapter 9, Concurrency with Akka.
- Deploying Node.js
- LaTeX Cookbook
- Kali Linux Web Penetration Testing Cookbook
- 編寫整潔的Python代碼(第2版)
- Mastering Julia
- Mastering Yii
- MySQL數據庫管理與開發實踐教程 (清華電腦學堂)
- PHP+MySQL網站開發項目式教程
- D3.js 4.x Data Visualization(Third Edition)
- 青少年學Python(第1冊)
- 新一代SDN:VMware NSX 網絡原理與實踐
- ExtJS Web應用程序開發指南第2版
- AI自動化測試:技術原理、平臺搭建與工程實踐
- Google Adsense優化實戰
- Swift 2 Design Patterns