- Mastering Clojure
- Akhil Wali
- 1703字
- 2021-07-09 20:18:03
Executing tasks in parallel
The simultaneous execution of several computations is termed as parallelism. The use of parallelism tends to increase the overall performance of a computation, since the computation can be partitioned to execute on several cores or processors. Clojure has a couple of functions that can be used for the parallelization of a particular computation or task, and we will briefly examine them in this section.
Note
The following examples can be found in src/m_clj/c2/parallel.clj
of the book's source code.
Suppose we have a function that pauses the current thread for some time and then returns a computed value, as depicted in Example 2.17:
(defn square-slowly [x] (Thread/sleep 2000) (* x x))
Example 2.17: A function that pauses the current thread
The function square-slowly
in Example 2.17 requires a single argument x
. This function pauses the current thread for two seconds and returns the square of its argument x
. If the function square-slowly
is invoked over a collection of three values using the map
function, it takes three times as long to complete, as shown here:
user> (time (doall (map square-slowly (repeat 3 10))))
"Elapsed time: 6000.329702 msecs"
(100 100 100)
The previously shown map
form returns a lazy sequence, and hence the doall
form is required to realize the value returned by the map form. We could also use the dorun
form to perform this realization of a lazy sequence. The entire expression is evaluated in about six seconds, which is thrice the time taken by the square-slowly
function to complete. We can parallelize the application of the square-slowly
function using the pmap
function instead of map
, as shown here:
user> (time (doall (pmap square-slowly (repeat 3 10))))
"Elapsed time: 2001.543439 msecs"
(100 100 100)
The entire expression now evaluates in the same amount of time required for a single call to the square-slowly
function. This is due to the square-slowly
function being called in parallel over the supplied collection by the pmap
form. Thus, the pmap
form has the same semantics as that of the map
form, except that it applies the supplied function in parallel.
The pvalues
and pcalls
forms can also be used to parallelize computations. The pvalues
form evaluates the expressions passed to it in parallel, and returns a lazy sequence of the resulting values. Similarly, the pcalls
form invokes all functions passed to it, which must take no arguments, in parallel and returns a lazy sequence of the values returned by these functions:
user> (time (doall (pvalues (square-slowly 10) (square-slowly 10) (square-slowly 10)))) "Elapsed time: 2007.702703 msecs" (100 100 100) user> (time (doall (pcalls #(square-slowly 10) #(square-slowly 10) #(square-slowly 10)))) "Elapsed time: 2005.683279 msecs" (100 100 100)
As shown in the preceding output, both expressions that use the pvalues
and pcalls
forms take the same amount of time to evaluate as a single call to the square-slowly
function.
Note
The pmap
, pvalues
, and pcalls
forms all return lazy sequences that have to be realized using the doall
or dorun
form.
Controlling parallelism with thread pools
The pmap
form schedules parallel execution of the supplied function on the default threadpool. If we wish to configure or tweak the threadpool used by pmap
, the claypoole
library (https://github.com/TheClimateCorporation/claypoole) is a good option. This library provides an implementation of the pmap
form that must be passed a configurable threadpool. We will now demonstrate how we can use this library to parallelize a given function.
Note
The following library dependencies are required for the upcoming examples:
[com.climate/claypoole "1.0.0"]
Also, the following namespaces must be included in your namespace declaration:
(ns my-namespace (:require [com.climate.claypoole :as cp] [com.climate.claypoole.lazy :as cpl]))
The pmap
function from the com.climate.claypoole
namespace is essentially a variant of the standard pmap
function to which we supply a threadpool instance to be used in parallelizing a given function. We can also supply the number of threads to be used by this variant of the pmap
function in order to parallelize a given function, as shown here:
user> (time (doall (cpl/pmap 2 square-slowly [10 10 10])))
"Elapsed time: 4004.029789 msecs"
(100 100 100)
As previously shown, the pmap
function from the claypoole
library can be used to parallelize the square-slowly
function that we defined earlier in Example 2.17 over a collection of three values. These three elements are computed over in two batches, in which each batch will parallely apply the square-slowly
function over two elements in two separate threads. Since the square-slowly
function takes two seconds to complete, the total time taken to compute over the collection of three elements is around four seconds.
We can create an instance of a pool of threads using the threadpool
function from the claypoole
library. This threadpool instance can then be passed to the pmap
function from the claypoole
library. The com.climate.claypoole
namespace also provides the ncpus
function that returns the number of physical processors available to the current process. We can create a threadpool instance and pass it to this variant of the pmap
function as shown here:
user> (def pool (cp/threadpool (cp/ncpus))) #'user/pool user> (time (doall (cpl/pmap pool square-slowly [10 10 10]))) "Elapsed time: 4002.05885 msecs" (100 100 100)
Assuming that we are running the preceding code on a computer system that has two physical processors, the call to the threadpool
function shown previously will create a threadpool of two threads. This threadpool instance can then be passed to the pmap
function as shown in the preceding example.
Note
We can fall back to the standard behavior of the pmap
function by passing the :builtin
keyword as the first argument to the com.climate.claypoole/pmap
function. Similarly, if the keyword :serial
is passed as the first argument to the claypoole
version of the pmap
function, the function behaves like the standard map
function.
The threadpool
function also supports a couple of useful key options. Firstly, we can create a pool of non-daemon threads using the :daemon false
optional argument. Daemon threads are killed when the process exits, and all threadpools created by the threadpool
function are pools of daemon threads by default. We can also name a threadpool using the :name
key option of the threadpool
function. The :thread-priority
key option can be used to indicate the priority of the threads in the new threadpool.
Tasks can also be prioritized using the pmap
, priority-threadpool
, and with-priority
forms from the claypoole
library. A priority threadpool is created using the priority-threadpool
function, and this new threadpool can be used along with the with-priority
function to assign a priority to a task that must be parallelized using pmap
, as shown here:
user> (def pool (cp/priority-threadpool (cp/ncpus)) #'user/pool user> (def task-1 (cp/pmap (cp/with-priority pool 1000) square-slowly [10 10 10])) #'user/task-1 user> (def task-2 (cp/pmap (cp/with-priority pool 0) square-slowly [5 5 5])) #'user/task-2
Tasks with higher priority are assigned to threads first. Hence, the task represented by task-1
will be assigned to a thread of execution before the task represented by task-2
in the previous output.
To gracefully deallocate a given threadpool, we can call the shutdown
function from the com.climate.claypoole
namespace, which accepts a threadpool instance as its only argument. The shutdown!
function from the same namespace will forcibly shut down the threads in a threadpool. The shutdown!
function can also be called using the with-shutdown!
macro. We specify the threadpools to be used for a series of computations as a vector of bindings to the with-shutdown!
macro. This macro will implicitly call the shutdown!
function on all of the threadpools that it has created once all the computations in the body of this macro are completed. For example, we can define a function to create a threadpool, use it for a computation, and finally, shut down the threadpool, using the with-shutdown!
function as shown in Example 2.18:
(defn square-slowly-with-pool [v] (cp/with-shutdown! [pool (cp/threadpool (cp/ncpus))] (doall (cp/pmap pool square-slowly v))))
Example 2.18: Using a priority threadpool
The square-slowly-with-pool
function defined in Example 2.18 will create a new threadpool, represented by pool
, and then use it to call the pmap
function. The shutdown!
function is implicitly called once the doall
form completely evaluates the lazy sequence returned by the pmap
function.
The claypoole
library also supports unordered parallelism, in which results of inpidual threads of computation are used as soon as they are available in order to minimize latency. The com.climate.claypoole/upmap
function is an unordered parallel version of the pmap
function.
The com.climate.claypoole
namespace also provides several other functions that use threadpools, as described here:
- The
com.climate.claypoole/pvalues
function is a threadpool-based implementation of thepvalues
function. It will evaluate its arguments in parallel using a supplied threadpool and return a lazy sequence. - The
com.climate.claypoole/pcalls
function is a threadpool-based version of thepcalls
function, which invokes several no-argument functions to return a lazy sequence. - A future that uses a given threadpool can be created using the
com.climate.claypoole/future
function. - We can evaluate an expression in a parallel fashion over the items in a given collection using the
com.climate.claypoole/pfor
function. - The
upvalues
,upcalls
, andupfor
functions in thecom.climate.claypoole
namespace are unordered parallel versions of thepvalues
,pcalls
, andpfor
functions, respectively, from the same namespace.
It is quite evident that the pmap
function from the com.climate.claypoole
namespace will eagerly evaluate the collection it is supplied. This may be undesirable when we intend to call pmap
over an infinite sequence. The com.climate.claypoole.lazy
namespace provides versions of pmap
and other functions from the com.climate.claypoole
namespace that preserve the laziness of a supplied collection. The lazy version of the pmap
function can be demonstrated as follows:
user> (def lazy-pmap (cpl/pmap pool square-slowly (range))) #'user/lazy-pmap user> (time (doall (take 4 lazy-pmap))) "Elapsed time: 4002.556548 msecs" (0 1 4 9)
The previously defined lazy-pmap
sequence is a lazy sequence created by mapping the square-slowly
function over the infinite sequence (range)
. As shown previously, the call to the pmap
function returns immediately, and the first four elements of the resulting lazy sequence are realized in parallel using the doall
and take
functions.
To summarize, Clojure has the pmap
, pvalues
, and pcalls
primitives to deal with parallel computations. If we intend to control the amount of parallelism utilized by these functions, we can use the claypoole
library's implementations of these primitives. The claypoole
library also supports other useful features such as prioritized threadpools and unordered parallelism.
- 一步一步學Spring Boot 2:微服務項目實戰
- Mobile Application Development:JavaScript Frameworks
- Oracle 11g從入門到精通(第2版) (軟件開發視頻大講堂)
- Python自動化運維快速入門
- 編寫高質量代碼:改善Python程序的91個建議
- Scientific Computing with Scala
- Symfony2 Essentials
- uni-app跨平臺開發與應用從入門到實踐
- GitHub入門與實踐
- Backbone.js Testing
- AI自動化測試:技術原理、平臺搭建與工程實踐
- 嵌入式C編程實戰
- C語言從入門到精通(視頻實戰版)
- Python人工智能項目實戰
- Natural Language Processing with Python Cookbook