書名： Mastering Clojure
作者名： Akhil Wali
本章字數： 485字
更新時間： 2021-07-09 20:18:04

Chapter 3. Parallelization Using Reducers

Reducers are another way of looking at collections in Clojure. In this chapter, we will study this particular abstraction of collections, and how it is quite orthogonal to viewing collections as sequences. The motivation behind reducers is to increase the performance of computations over collections. This performance gain is achieved mainly through parallelization of such computations.

As we have seen in Chapter 1, Working with Sequences and Patterns, sequences and laziness are a great way to handle collections. The Clojure standard library provides several functions to handle and manipulate sequences. However, abstracting a collection as a sequence has an unfortunate consequence; any computation performed over all the elements of a sequence is inherently sequential. Also, all of the standard sequence functions create a new collection that is similar to the collection passed to these functions. Interestingly, performing a computation over a collection without creating a similar collection, even as an intermediary result, is quite useful. For example, it is often required to reduce a given collection to a single value through a series of transformations in an iterative manner. This sort of computation does not necessarily require the intermediary results of each transformation to be saved.

A consequence of iteratively computing values from a collection is that we cannot parallelize it in a straightforward way. Modern MapReduce frameworks handle this kind of computation by pipelining the elements of a collection through several transformations in parallel, and finally, reducing the results into a single result. Of course, the result could as well be a new collection. A drawback of this methodology is that it produces concrete collections as intermediate results of each transformation, which is rather wasteful. For example, if we wanted to filter out values from a collection, the MapReduce strategy would require creating empty collections to represent values that are left out of the reduction step to produce the final result.

This incurs unnecessary memory allocation and also creates additional work for the reduction step, which produces the final result. Hence, there's a scope for optimizing these sorts of computations.

This brings us to the notion of treating computations over collections as reducers to attain better performance. Of course, this doesn't mean that reducers are a replacement for sequences. Sequences and laziness are great for abstracting computations that create and manipulate collections, while reducers are a specialized high-performance abstraction of collections in which a collection needs to be piped through several transformations, and finally, combined to produce the final result. Reducers achieve a performance gain in the following ways:

Reducing the amount of memory allocated to produce the desired result
Parallelizing the process of reducing a collection into a single result, which could be an entirely new collection

The clojure.core.reducers namespace provides several functions to process collections using reducers. Let's now examine how reducers are implemented and a few examples that demonstrate how reducers can be used.

官术网_书友最值得收藏!

Mastering Clojure

Chapter 3. Parallelization Using Reducers