官术网_书友最值得收藏!

Retrying downloads

Often, the errors encountered when downloading are temporary; an example is when the web server is overloaded and returns a 503 Service Unavailable error. For these errors, we can retry the download after a short time because the server problem may now be resolved. However, we do not want to retry downloading for all errors. If the server returns 404 Not Found, then the web page does not currently exist and the same request is unlikely to produce a different result.

The full list of possible HTTP errors is defined by the Internet Engineering Task Force, and is available for viewing at https://tools.ietf.org/html/rfc7231#section-6. In this document, we can see that 4xx errors occur when there is something wrong with our request and 5xx errors occur when there is something wrong with the server. So, we will ensure our download function only retries the 5xx errors. Here is the updated version to support this:

def download(url, num_retries=2): 
print('Downloading:', url)
try:
html = urllib.request.urlopen(url).read()
except (URLError, HTTPError, ContentTooShortError) as e:
print('Download error:', e.reason)
html = None
if num_retries > 0:
if hasattr(e, 'code') and 500 <= e.code < 600:
# recursively retry 5xx HTTP errors
return download(url, num_retries - 1)
return html

Now, when a download error is encountered with a 5xx code, the download error is retried by recursively calling itself. The function now also takes an additional argument for the number of times the download can be retried, which is set to two times by default. We limit the number of times we attempt to download a web page because the server error may not recover. To test this functionality we can try downloading http://httpstat.us/500, which returns the 500 error code:

    >>> download('http://httpstat.us/500')
Downloading: http://httpstat.us/500
Download error: Internal Server Error
Downloading: http://httpstat.us/500
Download error: Internal Server Error
Downloading: http://httpstat.us/500
Download error: Internal Server Error

As expected, the download function now tries downloading the web page, and then, on receiving the 500 error, it retries the download twice before giving up.

主站蜘蛛池模板: 平陆县| 南岸区| 镇坪县| 西盟| 松阳县| 固安县| 宁城县| 昭苏县| 丘北县| 田林县| 昌邑市| 遂昌县| 榆林市| 沐川县| 左贡县| 刚察县| 金堂县| 南京市| 屯门区| 惠州市| 长丰县| 大英县| 迁安市| 镇原县| 青州市| 鄂伦春自治旗| 文水县| 金坛市| 吉木乃县| 涪陵区| 新化县| 永福县| 莒南县| 武鸣县| 太谷县| 衡阳市| 阿坝县| 墨脱县| 景谷| 邛崃市| 巨鹿县|