官术网_书友最值得收藏!

Using Node as an HTTP client

The HTTP object doesn't just provide server capabilities, it also affords us with client functionality. We might want to use this functionality for a myriad of purposes: HTTP-based API's (such as a REST-based interface), website scraping for statistical processing or in the absence of an API, or the first step in automated UI testing. In this task, we're going to use http.get with process to fetch external web pages dynamically via the command line.

Getting ready

We are not creating a server. So in the name of convention, we should use a different name for our new file. Let's call it fetch.js.

How to do it...

The http.request method allows us to make requests of any kind (for example, GET, POST, DELETE, OPTION, and so on), but for GET requests, we can use the shorthand http.get method as shown in the following code:

var http = require('http');
var urlOpts = {host: 'www.nodejs.org', path: '/', port: '80'};
http.get(urlOpts, function (response) {
  response.on('data', function (chunk) {
    console.log(chunk.toString());
  });
});

Essentially, we're done! Try to run the following command:

node fetch.js

Our console will output the HTML of nodejs.org. However, let's pad it out a bit with some interactivity and error handling, as follows:

var http = require('http');
var url = require('url');
var urlOpts = {host: 'www.nodejs.org', path: '/', port: '80'};
if (process.argv[2]) {
 if (!process.argv[2].match('http://')) {
 process.argv[2] = 'http://' + process.argv[2];
 }
 urlOpts = url.parse(process.argv[2]);
}
http.get(urlOpts, function (response) {
  response.on('data', function (chunk) {
     console.log(chunk.toString());
  });
}).on('error', function (e) {
 console.log('error:' + e.message);
});

Now we can use our script with the help of the following command:

node fetch.js www.google.com

How it works...

The http.get method takes an object that defines the criteria of our desired request. We defined a variable called urlOpts for this purpose and set our host to www.nodejs.org. We use the process.argv property to check whether a web address has been specified via the command line.

Just like the console object, the process object is a global variable that is always available within Node's runtime environment. The process.argv[2] argument is the third command-line argument, with node and fetch.js being allocated to [0] and [1], respectively.

If process.argv[2] exists (that is, if an address has been specified), we append http://; if it isn't there (url.parse requires it), then replace the object in our default urlOpts with the output from url.parse. An object is returned by url.parse happily with the same properties that http.get requires.

As a client, we are interacting with the server's response to us, rather than the client's request from us. So inside the http.get callback, we listen for the data event on response instead of (as with our server examples) request. As the response data stream arrives, we output the chunks to the console.

Note

For terser APIs built on top of the HTTP request, check out the third-party module's request (https://npmjs.org/package/request) and superagent (https://npmjs.org/package/superagent).

There's more...

Let's explore some of the possibilities of the http.get method's underlying http.request method.

Sending POST requests

We'll need to fire up our server.js app from the Processing POST data recipe to receive our POST requests. Let's create the following new file and call it post.js, which we'll use to send POST requests to our POST server:

var http = require('http');
var urlOpts = {host: 'localhost', path: '/', port: '8080', method: 'POST'};
var request = http.request(urlOpts, function (response) {
  response.on('data', function (chunk) {
    console.log(chunk.toString());
  });
}).on('error', function (e) {
  console.log('error:' + e.stack);
});
process.argv.forEach(function (postItem, index) {
  if (index > 1) { request.write(postItem + '\n'); }
});
request.end();

As we're using the more general http.request method, we've had to define our HTTP verb in the urlOpts variable. Our urlOpts variable also specifies the server as localhost:8080 (we must ensure that our POST server is running in order for this code to work).

As seen before, we set up an event listener in our callback for data on the response object. The http.request method returns a clientRequest object, which we load into a variable called request. This is a newly declared variable, which holds the returned clientRequest object from our http.request method.

After our event listeners, we loop through the command-line arguments using the forEach method of Ecmascript 5 (which is safe to use in Node, but not yet in browsers). On running this script, node and post.js would be the zero and first arguments, so we check that our array index is greater than 1 before sending any arguments as POST data. We use request.write to send data, similar to how we would use response.write if we were building a server. Even though it uses a callback, forEach is not asynchronous (it blocks until completion). So only after every element is processed, our POST data is written and our request ended. The following command shows how we use it:

node post.js foo=bar&x=y&anotherfield=anothervalue

Multipart file upload as a client

We'll use our upload server from the Handling file uploads recipe to receive the files from our uploading client. To achieve this, we have to deal with the multipart data format. To inform a server of the client's intentions of sending multipart data, we set the Content-Type header to multipart/form-data with an additional attribute called boundary, which is a custom-named delimiter that separates in the multipart data, as follows:

var http = require('http');
var fs = require('fs');
var urlOpts = { host: 'localhost', path: '/', port: '8080', method: 'POST'};
var boundary = Date.now();
urlOpts.headers = {
  'Content-Type': 'multipart/form-data; boundary="' +boundary+ '"'
};

We've used the fs module here too as we'll require that later to load our files.

We've set our boundary parameter to the current Unix time (milliseconds since midnight, January 1, 1970). We won't need boundary again in this format, so let's update it with the required multipart double dash (--) prefix and set up our http.request call, as follows:

boundary = "--" + boundary;
var request = http.request(urlOpts, function (response) {
  response.on('data', function (chunk) {
    console.log(chunk.toString());
  });
}).on('error', function (e) {
  console.log('error:' + e.stack);
});

We want to be able to stream multipart data to the server, which may be compiled from multiple files. If we streamed these files simultaneously, attempting to compile them together into the multipart format, the data would likely be mashed together from different file streams in an unpredictable order, becoming impossible to parse. So we need a way to preserve the data order.

We could build it all in one go and send it to the server afterwards. A more efficient (and Node-like) solution is to build the multipart message by progressively assembling each file into the multipart format as the file is streamed in, while instantly streaming the multipart data as it's being built.

To achieve this, we can use a recursively self-invoking function, calling each iteration from within the end event callback to ensure each stream is captured separately and in order, as follows:

(function multipartAssembler(files) {
  var f = files.shift(), fSize = fs.statSync(f).size;
  fs.createReadStream(f).on('end', function () {
    if (files.length) {
      multipartAssembler(files);
      return; //early finish
    }
    //any code placed here wont execute until no files are left
    //due to early return from function.
  });
}(process.argv.splice(2, process.argv.length)));

This is also a self-calling function because we've changed it from a declaration to an expression by wrapping parentheses around it. Then we've called it by appending parentheses, also passing in the command-line arguments, which specify what files to upload. We'll see this by executing the following command:

node upload.js file1 file2 fileN

We use splice on the process.argv array to remove the first two arguments (which would be node and upload.js). The result is passed into our multipartAssembler function as our files parameter.

Inside our function, we immediately shift the first file off the files array and load it into variable f, which is passed into createReadStream. Once it's finished reading, we pass any remaining files back through our multipartAssembler function and repeat the process until the array is empty. Now, let's flesh our self-iterating function out with multipart goodness, as follows:

(function multipartAssembler(files) {
  var f = files.shift(), fSize = fs.statSync(f).size,
  progress = 0;
  fs.createReadStream(f)
  .once('open', function () {
    request.write(boundary + '\r\n' +
      'Content-Disposition: ' +
      'form-data; name="userfile"; filename="' + f + '"\r\n' +
      'Content-Type: application/octet-stream\r\n' +
      'Content-Transfer-Encoding: binary\r\n\r\n');
  }).on('data', function(chunk) {
    request.write(chunk);
    progress += chunk.length;
    console.log(f + ': ' + Math.round((progress / fSize) * 10000)/100 + '%');
  }).on('end', function () {
    if (files.length) {
      multipartAssembler(files);
      return; //early finish
    }
    request.end('\r\n' + boundary + '--\r\n\r\n\r\n');
  });
}(process.argv.splice(2, process.argv.length)));

We specify a part with the predefined boundary initially set in the content-type header. Each part has to begin with a header; we latch on to the open event to send this header out.

The Content-Disposition header has three parts. In this scenario, the first part will always be form-data. The remaining two parts define the name of the field (for instance the name attribute of a file input), and the original filename. The Content-Type header can be set to whatever mime is relevant. However, by setting all files to application/octet-stream and Content-Transfer-Encoding to binary, we can safely treat all files the same way if all we're doing is saving to disk without any interim processing. We finish each multipart header with a double CRLF (\r\n\r\n) at the end of our request.write method.

Also, notice we've also assigned a new progress variable at the top of the multipartAssembler function. We use this to determine the relative percent of the upload by dividing the chunks received so far (progress) by the total file size (fSize). This calculation is performed in our data event callback, where we also stream each chunk to the server.

In our end event, if there are no more files to process, we end the request with the final multipart boundary, which is the same as other boundary partitions except it has leading and trailing slashes.

See also

  • The Working with real data: fetching trending tweets recipe discussed in Chapter 3, Working with Data Serialization
主站蜘蛛池模板: 吐鲁番市| 潞西市| 平陆县| 凌源市| 垫江县| 永靖县| 六枝特区| 永平县| 县级市| 宁武县| 岳阳市| 台前县| 班玛县| 周至县| 丹寨县| 福建省| 张家口市| 汉中市| 泰和县| 彭阳县| 广南县| 安国市| 凤翔县| 新安县| 东海县| 宣化县| 都江堰市| 兖州市| 花莲市| 东明县| 和平区| 江都市| 北碚区| 铜川市| 体育| 永吉县| 玛多县| 拜泉县| 舞阳县| 乐陵市| 平度市|