- Mastering Concurrency in Python
- Quan Nguyen
- 354字
- 2021-06-10 19:24:07
HTTP requests
In a typical communication process on the web, HTML texts are the data that is to be saved and/or further processed. This data needs to be first collected from web pages, but how can we go about doing that? Most of the communication is done via the internet—more specifically, the World Wide Web—and this utilizes the Hypertext Transfer Protocol (HTTP). In HTTP, request methods are used to convey the information of what data is being requested and should be sent back from a server.
For example, when you type packtpub.com in your browser, the browser sends a request method via HTTP to the Packt website's main server asking for data from the website. Now, if both your internet connection and Packt's server are working well, then your browser will receive a response back from the server, as shown in the following diagram. This response will be in the form of an HTML document, which will be interpreted by your browser, and your browser will display the corresponding HTML output to the screen.
Generally, request methods are defined as verbs that indicate the desired action to be performed while the HTTP client (web browsers) and the server communicate with each other: GET, HEAD, POST, PUT, DELETE, and so on. Of these methods, GET and POST are two of the most common request methods used in web-scraping applications; their function is described in the following list:
- The GET method makes a request for a specific data from the server. This method only retrieves data and has no other effect on the server and its databases.
- The POST method sends data in a specific form that is accepted by the server. This data could be, for example, a message to a bulletin board, mailing list, or a newsgroup; information to be submitted to a web form; or an item to be added to a database.
All general-purpose HTTP servers that we commonly see on the internet are actually required to implement at least the GET (and HEAD) method, while the POST method is considered optional.