官术网_书友最值得收藏!

  • Node.js Web Development
  • David Herron
  • 4320字
  • 2021-06-11 18:48:18

Finding and loading modules using require and import

In the course of learning about modules for Node.js, we've used the require and import features without going into detail about how modules are found and all the options available. The algorithm for finding Node.js modules is very flexible. It supports finding modules that are siblings of the currently executing module, or have been installed local to the current project, or have been installed globally.

For both require and import, the command takes a module identifier. The algorithm Node.js uses is in charge of resolving the module identifier into a file containing the module, so that Node.js can load the module.

The official documentation for this is in the Node.js documentation, at https://nodejs.org/api/modules.html.

The official documentation for ES6 modules also discusses how the algorithm differs, at 
https://nodejs.org/api/esm.html.

Understanding the module resolution algorithm is one key to success with Node.js. This algorithm determines how best to structure the code in a Node.js application. While debugging problems with loading the correct version of a given package, we need to know how Node.js finds packages.

First, we must consider several types of modules, starting with the simple file modules we've already used.

Understanding File modules

The CommonJS and ES6 modules we've just looked at are what the Node.js documentation describes as a file module. Such modules are contained within a single file, whose filename ends with .js, .cjs.mjs, .json, or .node. The latter are compiled from C or C++ source code, or even other languages such as Rust, while the former are, of course, written in JavaScript or JSON. 

The module identifier of a file module must start with ./ or ../. This signals Node.js that the module identifier refers to a local file. As should already be clear, this module identifier refers to a pathname relative to the currently executing module.

It is also possible to use an absolute pathname as the module identifier. In a CommonJS module, such an identifier might be /path/to/some/directory/my-module.js. In an ES6 module, since the module identifier is actually a URL, then we must use a file:// URL like file:///path/to/some/directory/my-module.mjs. There are not many cases where we would use an absolute module identifier, but the capability does exist.

One difference between CommonJS and ES6 modules is the ability to use extensionless module identifiers. The CommonJS module loader allows us to do this, which you should save as extensionless.js:

const simple = require('./simple');

console.log(simple.hello());
console.log(`${simple.next()}`);
console.log(`${simple.next()}`);

This uses an extension-less module identifier to load a module we've already discussed, simple.js:

$ node ./extensionless
Hello, world!
1
2

And we can run it with the node command using an extension-less module identifier.

But if we specify an extension-less identifier for an ES6 module:

$ node ./simpledemo2
internal/modules/cjs/loader.js:964
throw err;
^
Error: Cannot find module '/home/david/Chapter03/simpledemo2'
at Function.Module._resolveFilename (internal/modules/cjs/loader.js:961:17)
at Function.Module._load (internal/modules/cjs/loader.js:854:27)
at Function.executeUserEntryPoint [as runMain] (internal/modules/run_main.js:71:12)
at internal/main/run_main_module.js:17:47 {
code: 'MODULE_NOT_FOUND',
requireStack: []
}

We get the error message making it clear that Node.js could not resolve the file name. Similarly, in an ES6 module, the file name given to the import statement must have the file extension.

Next, let's discuss another side effect of ES6 module identifiers being a URL.

The ES6 import statement takes a URL

The module identifier in the ES6 import statement is a URL. There are several important considerations. 

Since Node.js only supports the file:// URLs, we're not allowed to retrieve a module over from a web server. There are obvious security implications, and the corporate security team would rightfully get anxious if modules could be loaded from http:// URLs.

Referencing a file with an absolute pathname must use the file:///path/to/file.ext syntax, as mentioned earlier. This is different from require, where we would use /path/to/file.ext instead.

Since ? and # have special significance in a URL, they also have special significance to the import statement, as in the following example:

import './module-name.mjs?query=1'

This loads the module named module-name.mjs with a query string containing query=1. By default, this is ignored by the Node.js module loader, but there is an experimental loader hook feature by which you can do something with the module identifier URL.

The next type of module to consider is those baked into Node.js, the core modules.

Understanding the Node.js core modules

Some modules are pre-compiled into the Node.js binary. These are the core Node.js modules documented on the Node.js website at https://nodejs.org/api/index.html.

They start out as source code within the Node.js build tree. The build process compiles them into the binary so that the modules are always available.

We've already seen how the core modules are used. In a CommonJS module, we might use the following:

const http = require('http');
const fs = require('fs').promises;

And the equivalent in an ES6 module would be as follows:

import http from 'http';
import { promises as fs } from 'fs';

In both cases, we're loading the http and fs core modules that would then be used by other code in the module.

Moving on, we will next talk about more complex module structures.

Using a directory as a module

We commonly organize stuff into a directory structure. The stuff here is a technical term referring to internal file modules, data files, template files, documentation, tests, assets, and more. Node.js allows us to create an entry-point module into such a directory structure.

For example, with a module identifier like ./some-library that refers to a directory, then there must be a file named index.js, index.cjs, index.mjs, or index.node in the directory. In such a case, the module loader loads the appropriate index module even though the module identifier did not reference a full pathname. The pathname is computed by appending the file it finds in the directory.

One common use for this is that the index module provides an API for a library stored in the directory and that other modules in the directory contain what's meant to be private implement details.

This may be a little confusing because the word module is being overloaded with two meanings. In some cases, a module is a file, and in other cases, a module is a directory containing one or more file modules.

While overloading the word module this way might be a little confusing, it's going to get even more so as we consider the packages we install from other sources.

Comparing installed packages and modules

Every programming platform supports the distribution of libraries or packages that are meant to be used in a wide array of applications. For example, where the Perl community has CPAN, the Node.js community has the npm registry. A Node.js installed package is the same as we just described as a folder as a module, in that the package format is simply a directory containing a package.json file along with the code and other files comprising the package.

There is the same risk of confusion caused by overloading the word module since an installed package is typically the same as the directories as modules concept just described. Therefore, it's useful to refer to an installed package with the word package.

The package.json file describes the package. A minimal set of fields are defined by Node.js, specifically as follows:

{ "name" : "some-library",
"main" : "./lib/some-library.js" }

The name field gives the name of the package. If the main field is present, it names the JavaScript file to use instead of index.js to load when the package is loaded. The package manager applications like npm and Yarn support many more fields in package.json, which they use to manage dependencies and versions and everything else.

If there is no package.json, then Node.js will look for either index.js or index.node. In such a case, require('some-library') will load the file module in /path/to/some-library/index.js.

Installed packages are kept in a directory named node_modules. When JavaScript source code has require('some-library') or import 'some-library', Node.js searches through one or more node_modules directories to find the named package.

Notice that the module identifier, in this case, is just the package name. This is different from the file and directory module identifiers we studied earlier since both those are pathnames. In this case, the module identifier is somewhat abstract, and that's because Node.js has an algorithm for finding packages within the nested structure of the node_modules directories.

To understand how that works, we need a deeper pe into the algorithm.

Finding the installed package in the file system 

One key to why the Node.js package system is so flexible is the algorithm used to search for packages.

For a given require, import(), or import statement, Node.js searches upward in the file system from the directory containing the statement. It is looking for a directory named node_modules containing a module satisfying the module identifier.

For example, with a source file named /home/david/projects/notes/foo.js and a require or import statement requesting the module identifier bar.js, Node.js tries the following options:

As just said, the search starts at the same level of the file system as foo.js. Node.js will look either for a file module named bar.js or else a directory named bar.js containing a module as described earlier in Using a Directory as a module. Node.js will check for this package in the node_modules directory next to foo.js and in every directory above that file. It will not, however, descend into any directory such as express or express/node_modules. The traversal only moves upward in the file system, not downward.

While some of the third-party packages have a name ending in .js, the vast majority do not. Therefore, we will typically use require('bar'). Also typically the 3rd party installed packages are delivered as a directory containing a package.json file and some JavaScript files. Therefore, in the typical case, the package module identifier would be bar, and Node.js will find a directory named bar in one of the node_modules directories and access the package from that directory.

This act of searching upward in the file system means Node.js supports the nested installation of packages. A Node.js package that in turn depends on other modules that will have its own node_modules directory; that is, the bar package might depend on the fred package. The package manager application might install fred as /home/david/projects/notes/node_modules/bar/node_modules/fred:

In such a case, when a JavaScript file in the bar package uses require('fred') its search for modules starts in /home/david/projects/notes/node_modules/bar/node_modules, where it will find the fred package. But if the package manager detects that other packages used by notes also use the fred package, the package manager will install it as /home/david/projects/notes/node_modules/fred.

Because the search algorithm traverses the file system upwards, it will find fred in either location.

The last thing to note is that this nesting of node_modules directories can be arbitrarily deep. While the package manager applications try to install packages in a flat hierarchy, it may be necessary to nest them deeply.

One reason for doing so is to enable using two or more versions of the same package.

Handling multiple versions of the same installed package

The Node.js package identifier resolution algorithm allows us to install two or more versions of the same package. Returning to the hypothetical notes project, notice that the fred package is installed not just for the bar package but also for the express package. 

Looking at the algorithm, we know that require('fred') in the bar package, and in the express package, will be satisfied by the corresponding fred package installed locally to each.

Normally, the package manager applications will detect the two instances of the fred package and install only one. But, suppose the bar package required the fred version 1.2, while the express package required the fred version 2.1.

In such a case, the package manager application will detect the incompatibility and install two versions of the fred package as so:

  • In /home/david/projects/notes/node_modules/bar/node_modules, it will install fred version 1.2.
  • In /home/david/projects/notes/node_modules/express/node_modules, it will install fred version 2.1.

When the express package executes require('fred') or import 'fred', it will be satisfied by the package in /home/david/projects/notes/node_modules/express/node_modules/fred. Likewise, the bar package will be satisfied by the package in /home/david/projects/notes/node_modules/bar/node_modules/fred. In both cases, the bar and express packages have the correct version of the fred package available. Neither is aware there is another version of fred installed.

The node_modules directory is meant for packages required by an application. Node.js also supports installing packages in a global location so they can be used by multiple applications.

Searching for globally installed packages

We've already seen that with npm we can perform a global install of a package. For example, command-line tools like hexy or babel are convenient if installed globally. In such a case the package is installed in another folder outside of the project directory. Node.js has two strategies for finding globally installed packages.

Similar to the PATH variable, the NODE_PATH environment variable can be used to list additional directories in which to search for packages. On Unix-like operating systems, NODE_PATH is a colon-separated list of directories, and on Windows it is semicolon-separated. In both cases, it is similar to how the PATH variable is interpreted, meaning that NODE_PATH has a list of directory names in which to find installed modules.

The NODE_PATH approach is not recommended, because of surprising behavior that can happen if people are unaware that this variable must be set. If a specific module located in a specific directory referenced in NODE_PATH is required for a proper function and the variable is not set, the application will likely fail. The best practice is for all dependencies to be explicitly declared, and with Node.js that means listing all dependencies in the package.json file so that npm or yarn can manage the dependencies.

This variable was implemented before the module resolution algorithm just described was finalized. Because of that algorithm, NODE_PATH is largely unnecessary. 

There are three additional locations that can hold modules:

  • $HOME/.node_modules
  • $HOME/.node_libraries
  • $PREFIX/lib/node

In this case, $HOME is what you expect (the user's home directory), and $PREFIX is the directory where Node.js is installed.

Some recommend against using global packages. The rationale is the desire for repeatability and deployability. If you've tested an app and all its code is conveniently located within a directory tree, you can copy that tree for deployment to other machines. But, what if the app depended on some other file that was magically installed elsewhere on the system? Will you remember to deploy such files? The application author might write documentation saying to install this then install that and install something-else before running npm install, but will the users of the application correctly follow all those steps? 

The best installation instructions is to simply run npm install or yarn install. For that to work, all dependencies must be listed in package.json.

Before moving forward, let's review the different kinds of module identifiers.

Reviewing module identifiers and pathnames

That was a lot of details spread out over several sections. It's useful, therefore, to quickly review how the module identifiers are interpreted when using the require, import(), or import statements:

  • Relative module identifiers: These begin with ./ or ../, and absolute identifiers begin with /. The module name is identical to POSIX filesystem semantics. The resultant pathname is interpreted relative to the location of the file being executed. That is, a module identifier beginning with ./ is looked for in the current directory, whereas one starting with ../ is looked for in the parent directory.
  • Absolute module identifiers: These begin with / (or file:// for ES6 modules) and are, of course, looked for in the root of the filesystem. This is not a recommended practice.
  • Top-level module identifiers: These do not begin with those strings and are just the module name. These must be stored in a node_modules directory, and the Node.js runtime has a nicely flexible algorithm for locating the correct node_modules directory.
  • Core modules: These are the same as the top-level module identifiers, in that there is no prefix, but the core modules are prebaked into the Node.js binary.

In all cases, except for the core modules, the module identifier resolves to a file that contains the actual module, and which is loaded by Node.js. Therefore, what Node.js does is to compute the mapping between the module identifier and the actual file name to load.

Using a package manager application is not required. The Node.js module resolution algorithm does not depend on a package manager, like npm or Yarn, to set up the node_modules directories. There is nothing magical about those directories, and it is possible to use other means to construct a node_modules directory containing installed packages. But the simplest mechanism is to use a package manager application.

Some packages offer what we might call a sub-package included with the main package, let's see how to use them.

Using deep import module specifiers

In addition to a simple module identifier like require('bar'), Node.js lets us directly access modules contained within a package. A different module specifier is used that starts with the module name, adding what's called a deep import path. For a concrete example, let's look at the mime module (https://www.npmjs.com/package/mime), which handles mapping a file name to its corresponding MIME type.

In the normal case, you use require('mime') to use the package. However, the authors of this package developed a lite version of this package that leaves out a lot of vendor-specific MIME types. For that version, you use require('mime/lite') instead. And of course, in an ES6 module, you use import 'mime' and import 'mime/lite', as appropriate.

The specifier mime/lite is an example of a deep import module specifier.

With such a module identifier, Node.js first locates the node_modules directory containing the main package. In this case, that is the mime package. By default, the deep import module is simply a path-name relative to the package directory, for example, /path/to/node_modules/mime/lite. Going by the rules we've already examined, it will be satisfied by a file named lite.js or a by a directory named lite containing a file named index.js or index.mjs.

But it is possible to override the default behavior and have the deep import specifier refer to a different file within the module.

Overriding a deep import module identifier

The deep import module identifier used by code using the package does not have to be the pathname used within the package source. We can put declarations in package.json describing the actual pathname for each deep import identifier. For example, a package with interior modules named ./src/cjs-module.js and ./src/es6-module.mjs can be remapped with this declaration in package.json:

{
"exports": {
"./cjsmodule": "./src/cjs-module.js",
"./es6module": "./src/es6-module.mjs"
}
}

With this, code using such a package can load the inner module using require('module-name/cjsmodule') or import 'module-name/es6module'. Notice that the filenames do not have to match what's exported.

In a package.json file using this exports feature, a request for an inner module not listed in exports will fail. Supposing the package has a ./src/hidden-module.js file, calling require('module-name/src/hidden-module.js') will fail.

All these modules and packages are meant to be used in the context of a Node.js project. Let's take a brief look at a typical project.

Studying an example project directory structure

A typical Node.js project is a directory containing a package.json file declaring the characteristics of the package, especially its dependencies. That, of course, describes a directory module, meaning that each module is its own project. At the end of the day, we create applications, for example, an Express application, and these applications depend on one or more (possibly thousands of) packages that are to be installed:

This is an Express application (we'll start using Express in Chapter 5, Your First Express Application) containing a few modules installed in the node_modules directory. A typical Express application uses app.js as the main module for the application, and has code and asset files distributed in the public, routes, and views directories. Of course, the project dependencies are installed in the node_modules directory.

But let's focus on the content of the node_modules directory versus the actual project files. In this screenshot, we've selected the express package. Notice it has a package.json file and there is an index.js file. Between those two files, Node.js will recognize the express directory as a module, and calling require('express') or import 'express' will be satisfied by this directory.

The express directory has its own node_modules directory, in which are installed two packages. The question is, why are those packages installed in express/node_modules rather than as a sibling of the express package?

Earlier we discussed what happens if two modules (modules A and B) list a dependency on different versions of the same module (C). In such a case, the package manager application will install two versions of C, one as A/node_modules/C and the other as B/node_modules/C. The two copies of C are thus located such that the module search algorithm will cause module A and module B to have the correct version of module C.

That's the situation we see with express/node_modules/cookie. To verify this, we can use an npm command to query for all references to the module:

$ npm ls cookie
notes@0.0.0 /Users/David/chap05/notes
├─┬ cookie-parser@1.3.5
│ └── cookie@0.1.3
└─┬ express@4.13.4
└── cookie@0.1.5

This says the cookie-parser module depends on version 0.1.3 of cookie, while Express depends on version 0.1.5.

Now that we can recognize what a module is and how they're found in the file system, let's discuss when we can use each of the methods to load modules.

Loading modules using require, import, and import()

Obviously require is used in CommonJS modules, and import is used in ES6 modules, but there are some details to go over. We've already discussed the format and filename differences between CommonJS and ES6 modules, so let's focus here on loading the modules.

The require function is only available in CommonJS modules, and it is used for loading a CommonJS module. The module is loaded synchronously, meaning that when the require function returns, the module is completely loaded.

By default, a CommonJS module cannot load an ES6 module. But as we saw with the simple-dynamic-import.js example, a CommonJS module can load an ES6 module using import(). Since the import() function is an asynchronous operation, it returns a Promise, and we, therefore, cannot use the resulting module as a top-level object. But we can use it inside a function:

module.exports.usesES6module = async function() {
const es6module = await import('./es6-module.mjs');
return es6module.functionCall();
}

And at the top-level of a Node.js script, the best we can do is the following:

import('./simple2.mjs')
.then(simple2 => {
console.log(simple2.hello());
console.log(simple2.next());
console.log(simple2.next());
console.log(`count = ${simple2.default()}`);
console.log(`Meaning: ${simple2.meaning}`);
})
.catch(err => {
console.error(err);
});

It's the same as the simple-dynamic-import.js example, but we are explicitly handling the Promise returned by import() rather than using an async function. While we could assign simple2 to a global variable, other code using that variable would have to accommodate the possibility the assignment hasn't yet been made.

The module object provided by import() contains the fields and functions exported with the export statements in the ES6 module. As we see here, the default export has the default name.

In other words, using an ES6 module in a CommonJS module is possible, so long as we accommodate waiting for the module to finish loading before using it.

The import statement is used to load ES6 modules, and it only works inside an ES6 module. The module specifier you hand to the import statement is interpreted as a URL.

An ES6 module can have multiple named exports. In the simple2.mjs we used earlier, these are the functions next, squared, and hello, and the values meaning and nocount. ES6 modules can have a single default export, as we saw in simple2.mjs.

With simpledemo2.mjs, we saw that we can import only the required things from the module:

import { default as simple, hello, next } from './simple2.mjs';

In this case, we use the exports as just the name, without referring to the module: simple(), hello(), and next().

It is possible to import just the default export:

import simple from './simple2.mjs';

In this case, we can invoke the function as simple(). We can also use what's called a namespace import; that is similar to how we import CommonJS modules:

import * as simple from './simple2.mjs';

console.log(simple.hello());
console.log(simple.next());
console.log(simple.next());
console.log(simple.default());
console.log(simple.meaning);

In this case, each property exported from the module is a property of the named object in the import statement. 

An ES6 module can also use import to load a CommonJS module. Loading the simple.js module we used earlier is accomplished as follows:

import simple from './simple.js';
console.log(simple.next());
console.log(simple.next());
console.log(simple.hello());

This is similar to the default export method shown for ES6 modules, and we can think of the module.exports object inside the CommonJS module as the default export. Indeed, the import can be rewritten as follows:

import { default as simple } from './simple.js';

This demonstrates that the CommonJS module.exports object is surfaced as default when imported.

We've learned a lot about using modules in Node.js. This included the different types of modules, and how to find them in the file system. Our next step is to learn about package management applications and the npm package repository.

主站蜘蛛池模板: 浠水县| 九龙县| 申扎县| 油尖旺区| 东乌珠穆沁旗| 宜兰县| 连州市| 甘洛县| 黄大仙区| 丽水市| 张掖市| 庐江县| 永平县| 玉树县| 承德市| 莲花县| 弋阳县| 新乡市| 民勤县| 汉源县| 马山县| 临安市| 瑞金市| 寿光市| 滁州市| 句容市| 太仓市| 阿克| 裕民县| 页游| 新竹县| 水富县| 巴东县| 厦门市| 密山市| 南木林县| 富锦市| 莎车县| 乐陵市| 焦作市| 大理市|