- Concurrent Patterns and Best Practices
- Atul S. Khot
- 227字
- 2021-07-16 17:32:32
Parallel collections
Say that I am describing some new and exciting algorithm to you. I start telling you about how the algorithm exploits hash tables. We typically think of such data structures as all residing in memory, locked (if required), and worked upon by one thread.
For example, take a list of numbers. Say that we want to sum all these numbers. This operation could be parallelized on multiple cores by using threads.
Now, we need to stay away from explicit locking. An abstraction that works concurrently on our list would be nice. It would split the list, run the function on each sublist, and collate the result in the end, as shown in the following diagram. This is the typical MapReduce paradigm in action:

The preceding diagram shows a Scala collection that has been parallelized in order to use concurrency internally.
What if the data structure is so large that it cannot all fit in the memory of a single machine? We could split the collection across a cluster of machines instead.
The Apache Spark framework does this for us. Spark's Resilient Distributed Dataset (RDD) is a partitioned collection that spreads the data structure across cluster machines, and thus can work on huge collections, typically to perform analytical processing.
- Mastering vRealize Operations Manager(Second Edition)
- Linux系統文件安全實戰全攻略
- WordPress Mobile Web Development:Beginner's Guide
- Haskell Financial Data Modeling and Predictive Analytics
- 嵌入式Linux應用開發菜鳥進階
- 嵌入式實時操作系統μC/OS原理與實踐
- 網絡操作系統教程:Windows Server 2016管理與配置
- OpenStack系統架構設計實戰
- 一學就會:Windows Vista應用完全自學手冊
- Advanced TypeScript Programming Projects
- Cassandra 3.x High Availability(Second Edition)
- VMware Horizon View Essentials
- 統信UOS應用開發進階教程
- 鴻蒙HarmonyOS手機應用開發實戰
- 大規模分布式系統架構與設計實戰