官术网_书友最值得收藏!

The difficulties of asynchronous programming

Losing control of asynchronous code in JavaScript is undoubtedly easy. Closures and in-place definitions of anonymous functions allow for a smooth programming experience that doesn't require the developer to jump to other points in the codebase. This is perfectly in line with the KISS principle (Keep It Simple, Stupid); it's simple, it keeps the code flowing, and we get it working in less time. Unfortunately, sacrificing qualities such as modularity, reusability, and maintainability will, sooner or later, lead to the uncontrolled proliferation of callback nesting, functions growing in size, and poor code organization. Most of the time, creating callbacks as in-place functions is not strictly required, so it's more a matter of discipline than a problem related to asynchronous programming. Recognizing that our code is becoming unwieldy or, even better, knowing in advance that it might become unwieldy and then acting accordingly with the most adequate solution, is what differentiates a novice from an expert.

Creating a simple web spider

To explain this problem, we will create a little web spider, a command-line application that takes in a web URL as input and downloads its contents locally into a file. In the code presented in this chapter, we are going to use a couple of npm dependencies:

Also, we will often refer to a local module named ./utils.js, which contains some helpers that we will be using in our application. We will omit the contents of this file for brevity, but you can find the full implementation, along with a package.json file containing the full list of dependencies, in the official repository at nodejsdp.link/repo.

The core functionality of our application is contained inside a module named spider.js. Let's see how it looks. To start with, let's load all the dependencies that we are going to use:

import fs from 'fs'
import path from 'path'
import superagent from 'superagent'
import mkdirp from 'mkdirp'
import { urlToFilename } from './utils.js'

Next, let's create a new function named spider(), which takes in the URL to download and a callback function that will be invoked when the download process completes:

export function spider (url, cb) {
  const filename = urlToFilename(url)
  fs.access(filename, err => {                                // (1)
    if (err && err.code === 'ENOENT') {
      console.log(`Downloading ${url} into ${filename}`)
      superagent.get(url).end((err, res) => {                 // (2)
        if (err) {
          cb(err)
        } else {
          mkdirp(path.dirname(filename), err => {             // (3)
            if (err) {
              cb(err)
            } else {
              fs.writeFile(filename, res.text, err => {       // (4)
                if (err) {
                  cb(err)
                } else {
                  cb(null, filename, true)
                }
              })
            }
          })
        }
      })
    } else {
      cb(null, filename, false)
    }
  })
}

There is a lot going on here, so let's discuss in more detail what happens in every step:

  1. The code checks whether the URL was already downloaded by verifying that the corresponding file was not already created. If err is defined and has type ENOENT, then the file does not exist and it's safe to create it:
    fs.access(filename, err => ...
    
  2. If the file is not found, the URL is downloaded using the following line of code:
    superagent.get(url).end((err, res) => ...
    
  3. Then, we make sure that the directory that will contain the file exists:
    mkdirp(path.dirname(filename), err => ...
    
  4. Finally, we write the body of the HTTP response to the filesystem:
    fs.writeFile(filename, res.text, err => ...
    

To complete our web spider application, we just need to invoke the spider() function by providing a URL as an input (in our case, we read it from the command-line arguments). The spider() function is exported from the file we defined previously. Let's now create a new file called spider-cli.js that can be directly invoked from the command line:

import { spider } from './spider.js'
spider(process.argv[2], (err, filename, downloaded) => {
  if (err) {
    console.error(err)
  } else if (downloaded) {
    console.log(`Completed the download of "${filename}"`)
  } else {
    console.log(`"${filename}" was already downloaded`)
  }
})

Now, we are ready to try our web spider application, but first, make sure you have the utils.js module and the package.json file containing the full list of dependencies in your project directory. Then, install all the dependencies by running the following command:

npm install

Now, let's execute the spider-cli.js module to download the contents of a web page with a command like this:

node spider-cli.js http://www.example.com

Our web spider application requires that we always include the protocol (for example, http://) in the URL we provide. Also, do not expect HTML links to be rewritten or resources such as images to be downloaded, as this is just a simple example to demonstrate how asynchronous programming works.

In the next section, you will learn how to improve the readability of this code and, in general, how to keep callback-based code as clean and readable as possible.

Callback hell

Looking at the spider() function we defined earlier, you will likely notice that even though the algorithm we implemented is really straightforward, the resulting code has several levels of indentation and is very hard to read. Implementing a similar function with a direct style blocking API would be straightforward, and most likely, the code would be much more readable. However, using asynchronous CPS is another story, and making bad use of in-place callback definitions can lead to incredibly bad code.

The situation where the abundance of closures and in-place callback definitions transforms the code into an unreadable and unmanageable blob is known as callback hell. It's one of the most widely recognized and severe anti-patterns in Node.js and JavaScript in general. The typical structure of code affected by this problem looks as follows:

asyncFoo(err => {
  asyncBar(err => {
    asyncFooBar(err => {
      //...
    })
  })
})

You can see how code written in this way assumes the shape of a pyramid due to deep nesting, and that's why it is also colloquially known as the pyramid of doom.

The most evident problem with code such as the preceding snippet is its poor readability. Due to the nesting being so deep, it's almost impossible to keep track of where a function ends and where another one begins.

Another issue is caused by the overlapping of the variable names used in each scope. Often, we have to use similar or even identical names to describe the content of a variable. The best example is the error argument received by each callback. Some people often try to use variations of the same name to differentiate the object in each scope, for example, errerrorerr1err2, and so on. Others prefer to just shadow the variable defined in the upper scope by always using the same name, for example, err. Both alternatives are far from perfect, and cause confusion and increase the probability of introducing defects.

Also, we have to keep in mind that closures come at a small price in terms of performance and memory consumption. In addition, they can create memory leaks that are not very easy to identify. In fact, we shouldn't forget that any context referenced by an active closure is retained from garbage collection.

For a great introduction to how closures work in V8, you can refer to the following blog post by Vyacheslav Egorov, a software engineer at Google working on V8, which you can read at nodejsdp.link/v8-closures.

If you look at our spider() function, you will notice that it clearly represents a callback hell situation and has all the problems just described. That's exactly what we are going to fix with the patterns and techniques that are covered in the following sections of this chapter.

主站蜘蛛池模板: 十堰市| 鞍山市| 营山县| 明水县| 汨罗市| 雅江县| 平顺县| 买车| 沙雅县| 富蕴县| 罗甸县| 肇庆市| 许昌县| 庆元县| 嵊州市| 卢龙县| 手游| 天全县| 新密市| 池州市| 吴桥县| 呈贡县| 安溪县| 乌兰县| 河源市| 武威市| 沂南县| 和硕县| 绥滨县| 博罗县| 台南市| 铁岭市| 西峡县| 武宣县| 宝鸡市| 保德县| 徐州市| 上虞市| 紫金县| 襄樊市| 龙山县|