- Concurrent Patterns and Best Practices
- Atul S. Khot
- 312字
- 2021-07-16 17:32:27
The MapReduce pattern
The MapReduce pattern is an example of a common case where concurrency is needed. The following diagram shows a word frequency counter; given a huge text stream of trillions of words, we need to see how many times every word occurs in the text. The algorithm is super simple: we keep the running count of each word in a hash table, with the word as the key and the counter as the value. The hash table allows us to quickly look up the next word and increment the associated value (counter).
Given the size of the input text, one single node does not have the memory for the entire hash table! Concurrency provides a solution, using the MapReduce pattern, as shown in the following diagram:
The solution is divide and conquer: we maintain a distributed hash table and run the same algorithm, suitably adapted for a cluster.
The master node reads—parses—the text and pushes it to a set of slave processing nodes. The idea is to distribute the text in such a way that one word is processed by one slave node only. For example, given three processing nodes, as shown in the preceding diagram, we would divide rangewise: push nodes starting with the characters {a..j} to node 1, {k..r} to node 2 and the rest—{s..z}—onto node 3. This is the map part (distribution of work).
Once the stream is exhausted, each slave node sends its frequency result back to the master, which prints the result.
The slave nodes are all doing the same processing concurrently. Note that the algorithm would run faster if we add, more slave nodes; that is, if we scale it horizontally.
- Designing Purpose:Built Drones for Ardupilot Pixhawk 2.1
- Linux系統(tǒng)架構(gòu)與運(yùn)維實(shí)戰(zhàn)
- Windows Server 2012 Hyper-V:Deploying the Hyper-V Enterprise Server Virtualization Platform
- Ganglia系統(tǒng)監(jiān)控
- 開源安全運(yùn)維平臺(tái)OSSIM疑難解析:入門篇
- Installing and Configuring Windows 10:70-698 Exam Guide
- 混沌工程:復(fù)雜系統(tǒng)韌性實(shí)現(xiàn)之道
- Instant Optimizing Embedded Systems using Busybox
- Windows Server 2019 Administration Fundamentals
- 從實(shí)踐中學(xué)習(xí)Kali Linux無線網(wǎng)絡(luò)滲透測(cè)試
- Linux服務(wù)器配置與管理
- OpenSolaris設(shè)備驅(qū)動(dòng)原理與開發(fā)
- Windows 7實(shí)戰(zhàn)從入門到精通
- Linux內(nèi)核修煉之道
- 鴻蒙入門:HarmonyOS應(yīng)用開發(fā)