- Node Cookbook(Second Edition)
- David Mark Clements
- 1660字
- 2021-07-16 12:04:29
Using Node as an HTTP client
The HTTP object doesn't just provide server capabilities, it also affords us with client functionality. We might want to use this functionality for a myriad of purposes: HTTP-based API's (such as a REST-based interface), website scraping for statistical processing or in the absence of an API, or the first step in automated UI testing. In this task, we're going to use http.get
with process
to fetch external web pages dynamically via the command line.
Getting ready
We are not creating a server. So in the name of convention, we should use a different name for our new file. Let's call it fetch.js
.
How to do it...
The http.request
method allows us to make requests of any kind (for example, GET
, POST
, DELETE
, OPTION
, and so on), but for GET
requests, we can use the shorthand http.get
method as shown in the following code:
var http = require('http'); var urlOpts = {host: 'www.nodejs.org', path: '/', port: '80'}; http.get(urlOpts, function (response) { response.on('data', function (chunk) { console.log(chunk.toString()); }); });
Essentially, we're done! Try to run the following command:
node fetch.js
Our console will output the HTML of nodejs.org
. However, let's pad it out a bit with some interactivity and error handling, as follows:
var http = require('http'); var url = require('url'); var urlOpts = {host: 'www.nodejs.org', path: '/', port: '80'}; if (process.argv[2]) { if (!process.argv[2].match('http://')) { process.argv[2] = 'http://' + process.argv[2]; } urlOpts = url.parse(process.argv[2]); } http.get(urlOpts, function (response) { response.on('data', function (chunk) { console.log(chunk.toString()); }); }).on('error', function (e) { console.log('error:' + e.message); });
Now we can use our script with the help of the following command:
node fetch.js www.google.com
How it works...
The http.get
method takes an object that defines the criteria of our desired request. We defined a variable called urlOpts
for this purpose and set our host to www.nodejs.org. We use the process.argv
property to check whether a web address has been specified via the command line.
Just like the console
object, the process
object is a global variable that is always available within Node's runtime environment. The process.argv[2]
argument is the third command-line argument, with node
and fetch.js
being allocated to [0]
and [1]
, respectively.
If process.argv[2]
exists (that is, if an address has been specified), we append http://
; if it isn't there (url.parse
requires it), then replace the object in our default urlOpts
with the output from url.parse
. An object is returned by url.parse
happily with the same properties that http.get
requires.
As a client, we are interacting with the server's response to us, rather than the client's request from us. So inside the http.get
callback, we listen for the data
event on response
instead of (as with our server examples) request
. As the response
data stream arrives, we output the chunks to the console.
Note
For terser APIs built on top of the HTTP request, check out the third-party module's request (https://npmjs.org/package/request) and superagent (https://npmjs.org/package/superagent).
There's more...
Let's explore some of the possibilities of the http.get
method's underlying http.request
method.
We'll need to fire up our server.js
app from the Processing POST data recipe to receive our POST requests. Let's create the following new file and call it post.js
, which we'll use to send POST requests to our POST server:
var http = require('http'); var urlOpts = {host: 'localhost', path: '/', port: '8080', method: 'POST'}; var request = http.request(urlOpts, function (response) { response.on('data', function (chunk) { console.log(chunk.toString()); }); }).on('error', function (e) { console.log('error:' + e.stack); }); process.argv.forEach(function (postItem, index) { if (index > 1) { request.write(postItem + '\n'); } }); request.end();
As we're using the more general http.request
method, we've had to define our HTTP verb in the urlOpts
variable. Our urlOpts
variable also specifies the server as localhost:8080
(we must ensure that our POST server is running in order for this code to work).
As seen before, we set up an event listener in our callback for data on the response
object. The http.request
method returns a clientRequest
object, which we load into a variable called request
. This is a newly declared variable, which holds the returned clientRequest
object from our http.request
method.
After our event listeners, we loop through the command-line arguments using the forEach
method of Ecmascript 5 (which is safe to use in Node, but not yet in browsers). On running this script, node
and post.js
would be the zero and first arguments, so we check that our array index is greater than 1 before sending any arguments as POST data. We use request.write
to send data, similar to how we would use response.write
if we were building a server. Even though it uses a callback, forEach
is not asynchronous (it blocks until completion). So only after every element is processed, our POST data is written and our request ended. The following command shows how we use it:
node post.js foo=bar&x=y&anotherfield=anothervalue
We'll use our upload server from the Handling file uploads recipe to receive the files from our uploading client. To achieve this, we have to deal with the multipart data format. To inform a server of the client's intentions of sending multipart data, we set the Content-Type
header to multipart/form-data
with an additional attribute called boundary
, which is a custom-named delimiter that separates in the multipart data, as follows:
var http = require('http'); var fs = require('fs'); var urlOpts = { host: 'localhost', path: '/', port: '8080', method: 'POST'}; var boundary = Date.now(); urlOpts.headers = { 'Content-Type': 'multipart/form-data; boundary="' +boundary+ '"' };
We've used the fs
module here too as we'll require that later to load our files.
We've set our boundary
parameter to the current Unix time (milliseconds since midnight, January 1, 1970). We won't need boundary
again in this format, so let's update it with the required multipart double dash (--
) prefix and set up our http.request
call, as follows:
boundary = "--" + boundary; var request = http.request(urlOpts, function (response) { response.on('data', function (chunk) { console.log(chunk.toString()); }); }).on('error', function (e) { console.log('error:' + e.stack); });
We want to be able to stream multipart data to the server, which may be compiled from multiple files. If we streamed these files simultaneously, attempting to compile them together into the multipart format, the data would likely be mashed together from different file streams in an unpredictable order, becoming impossible to parse. So we need a way to preserve the data order.
We could build it all in one go and send it to the server afterwards. A more efficient (and Node-like) solution is to build the multipart message by progressively assembling each file into the multipart format as the file is streamed in, while instantly streaming the multipart data as it's being built.
To achieve this, we can use a recursively self-invoking function, calling each iteration from within the end
event callback to ensure each stream is captured separately and in order, as follows:
(function multipartAssembler(files) { var f = files.shift(), fSize = fs.statSync(f).size; fs.createReadStream(f).on('end', function () { if (files.length) { multipartAssembler(files); return; //early finish } //any code placed here wont execute until no files are left //due to early return from function. }); }(process.argv.splice(2, process.argv.length)));
This is also a self-calling function because we've changed it from a declaration to an expression by wrapping parentheses around it. Then we've called it by appending parentheses, also passing in the command-line arguments, which specify what files to upload. We'll see this by executing the following command:
node upload.js file1 file2 fileN
We use splice
on the process.argv
array to remove the first two arguments (which would be node
and upload.js
). The result is passed into our multipartAssembler
function as our files
parameter.
Inside our function, we immediately shift the first file off the files
array and load it into variable f
, which is passed into createReadStream
. Once it's finished reading, we pass any remaining files back through our multipartAssembler
function and repeat the process until the array is empty. Now, let's flesh our self-iterating function out with multipart goodness, as follows:
(function multipartAssembler(files) { var f = files.shift(), fSize = fs.statSync(f).size, progress = 0; fs.createReadStream(f) .once('open', function () { request.write(boundary + '\r\n' + 'Content-Disposition: ' + 'form-data; name="userfile"; filename="' + f + '"\r\n' + 'Content-Type: application/octet-stream\r\n' + 'Content-Transfer-Encoding: binary\r\n\r\n'); }).on('data', function(chunk) { request.write(chunk); progress += chunk.length; console.log(f + ': ' + Math.round((progress / fSize) * 10000)/100 + '%'); }).on('end', function () { if (files.length) { multipartAssembler(files); return; //early finish } request.end('\r\n' + boundary + '--\r\n\r\n\r\n'); }); }(process.argv.splice(2, process.argv.length)));
We specify a part with the predefined boundary initially set in the content-type
header. Each part has to begin with a header; we latch on to the open
event to send this header out.
The Content-Disposition
header has three parts. In this scenario, the first part will always be form-data
. The remaining two parts define the name of the field (for instance the name
attribute of a file input), and the original filename. The Content-Type
header can be set to whatever mime is relevant. However, by setting all files to application/octet-stream
and Content-Transfer-Encoding
to binary
, we can safely treat all files the same way if all we're doing is saving to disk without any interim processing. We finish each multipart header with a double CRLF (\r\n\r\n
) at the end of our request.write
method.
Also, notice we've also assigned a new progress
variable at the top of the multipartAssembler
function. We use this to determine the relative percent of the upload by dividing the chunks received so far (progress
) by the total file size (fSize
). This calculation is performed in our data
event callback, where we also stream each chunk to the server.
In our end
event, if there are no more files to process, we end the request with the final multipart boundary, which is the same as other boundary partitions except it has leading and trailing slashes.
See also
- The Working with real data: fetching trending tweets recipe discussed in Chapter 3, Working with Data Serialization
- Java面向對象軟件開發
- C#程序設計(慕課版)
- Mastering Scientific Computing with R
- Python王者歸來
- The Complete Coding Interview Guide in Java
- Learning SciPy for Numerical and Scientific Computing(Second Edition)
- Extending Puppet(Second Edition)
- AIRIOT物聯網平臺開發框架應用與實戰
- HTML 5與CSS 3權威指南(第3版·上冊)
- Extreme C
- Visual Basic程序設計(第三版)
- Struts 2.x權威指南
- 算法圖解
- JavaScript悟道
- 虛擬現實建模與編程(SketchUp+OSG開發技術)