書名： Distributed Computing with Python
作者名： Francesco Pierfederici
本章字數： 762字
更新時間： 2021-07-09 19:30:13

Chapter 2. Asynchronous Programming

In this chapter, we are finally going to write some code! The code in this chapter and all the chapters that follow is written for Python 3.5 (the current release at the time of writing). When modules, syntaxes, or language constructs are not available in earlier versions of Python (for example, Python 2.7), these will be pointed out in this chapter. In general, however, the code presented here should work on Python 2.x with some minor modifications.

Let's go back to some of the ideas presented in the previous chapter. We know that we can structure our algorithms and programs so that they can run on a local machine or on one or more computers across a network. Even when our code runs on a single machine, as we saw, we can use multiple threads and/or multiple processes so that its various parts can run at the same time on multiple CPUs.

We will now pause thinking about multiple CPUs and instead look at a single thread/process of execution. There is a programming style called asynchronous or nonblocking programming that, in specific cases, leads to quite impressive performance gains when compared to the more traditional synchronous (or blocking) programming style.

Any computer program is conceptually composed of multiple tasks, each performing an operation. We can think of these tasks as functions and methods or even those inpidual steps that make up functions themselves. Examples of tasks could be piding one number by another, printing something on a screen, or something more complex like fetching data from the Web, and so on.

Let's look at how these tasks use a CPU in a typical program. Let's consider a generic example of a piece of software composed of four tasks: A, B, C, and D. What these tasks do is not important at this stage. However, let's just assume that each of these four tasks does both some computation and some I/O. The most intuitive way of organizing these four tasks is to invoke them sequentially. The following figure shows, schematically, the CPU utilization of this simple four-task application:

What we see in the preceding figure is that while each task performs its I/O operations, the CPU is sitting idle, waiting for the task to restart its computation. This leaves the CPU idle for a comparatively large amount of time.

The key observation here is that there is a dramatic difference (several orders of magnitude) in the speed at which we can move data from various components, such as disks, RAM, and the network, to the CPU.

The consequence of this massive difference in component bandwidth is that any code that handles significant I/O (disk access, network communication, and so on) has the risk of keeping the CPU idle for a large fraction of its execution time (as illustrated in the preceding figure).

The ideal situation would be to arrange our tasks so that when one task is waiting on I/O (that is, it is blocking), it is somehow suspended, and another one takes over the CPU. This is exactly what asynchronous (also called event-driven) programming is all about.

The following figure describes pictorially the reorganization of our conceptual four-task example using asynchronous programming:

Here, the four tasks are still called sequentially, but instead of blocking each reserving the CPU for itself until they are done, they all voluntarily give up the CPU when they do not need it (because they are waiting on data). While there are still times when the CPU is idle, the overall runtime of our program is now noticeably faster.

It might be obvious, but it is worth pointing out that multithreaded programming would allow us to achieve the same efficiency by running tasks in parallel in different threads. However, there is a fundamental difference here: when using in a multi-threaded program, the operating system is the one deciding exactly threads are active and when they are superseded. In asynchronous programming, instead, each task can decide when to give up the CPU and therefore suspend its execution.

In addition, with asynchronous programming alone, we do not achieve true concurrency; there is still only one task running at any given time, which removes most race conditions from our code. Of course, nothing is stopping us from mixing paradigms and using multiple threads and/or multiple processes with asynchronous techniques within a single thread/process.

Another thing to keep in mind is that asynchronous programming really shines when dealing with I/O rather than CPU-intensive tasks (since there is not really a performance gain in suspending busy tasks).

官术网_书友最值得收藏!

Distributed Computing with Python

Chapter 2. Asynchronous Programming