- Mastering Concurrency in Python
- Quan Nguyen
- 325字
- 2021-06-10 19:24:10
Avoid making a large number of requests
Each time one of the programs that we have been discussing runs, it makes HTTP requests to a server that manages the site that you'd like to extract data from. This process happens significantly more frequently and over a shorter amount of time in a concurrent program, where multiple requests are being submitted to that server.
As mentioned before, servers nowadays have the ability to handle multiple requests simultaneously with ease. However, to avoid having to overwork and overconsume resources, servers are also designed to stop answering requests that come in too frequently. Websites of big tech companies, such as Amazon or Twitter, look for large amounts of automated requests that are made from the same IP address and implement different response protocols; some requests might be delayed, some might be refused a response, or the IP address might even be banned from making further requests for a specific amount of time.
Interestingly, making repeated, heavy-duty requests to servers is actually a form of hacking a website. In Denial of Service (DoS) and Distributed Denial of Service (DDoS) attacks, a very large number of requests are made at the same time to the server, flooding the bandwidth of the targeted server with traffic, and as a result, normal, nonmalicious requests from other clients are denied because the servers are busy processing the concurrent requests, as illustrated in the following diagram:
It is therefore important to space out the concurrent requests that your application makes to a server so that the application would not be considered an attacker and be potentially banned or treated as a malicious client. This could be as simple as limiting the maximum number of threads/requests that can be implemented at a time in your program or pausing the threading for a specific amount of time (for example, using the time.sleep() function) before making a request to the server.
- Advanced Splunk
- DBA攻堅指南:左手Oracle,右手MySQL
- C語言程序設計習題解析與上機指導(第4版)
- C#程序設計實訓指導書
- Rust實戰
- SpringMVC+MyBatis快速開發與項目實戰
- 精通Scrapy網絡爬蟲
- Kotlin Standard Library Cookbook
- Flux Architecture
- 概率成形編碼調制技術理論及應用
- 可解釋機器學習:模型、方法與實踐
- Hands-On Automation Testing with Java for Beginners
- Extending Puppet(Second Edition)
- Learning JavaScript Data Structures and Algorithms(Second Edition)
- Learning TypeScript