- Python Web Scraping(Second Edition)
- Katharine Jarmul Richard Lawson
- 235字
- 2021-07-09 19:42:46
Supporting proxies
Sometimes it's necessary to access a website through a proxy. For example, Hulu is blocked in many countries outside the United States as are some videos on YouTube. Supporting proxies with urllib is not as easy as it could be. We will cover requests for a more user-friendly Python HTTP module that can also handle proxies later in this chapter. Here's how to support a proxy with urllib:
proxy = 'http://myproxy.net:1234' # example string
proxy_support = urllib.request.ProxyHandler({'http': proxy})
opener = urllib.request.build_opener(proxy_support)
urllib.request.install_opener(opener)
# now requests via urllib.request will be handled via proxy
Here is an updated version of the download function to integrate this:
def download(url, user_agent='wswp', num_retries=2, charset='utf-8', proxy=None):
print('Downloading:', url)
request = urllib.request.Request(url)
request.add_header('User-agent', user_agent)
try:
if proxy:
proxy_support = urllib.request.ProxyHandler({'http': proxy})
opener = urllib.request.build_opener(proxy_support)
urllib.request.install_opener(opener)
resp = urllib.request.urlopen(request)
cs = resp.headers.get_content_charset()
if not cs:
cs = charset
html = resp.read().decode(cs)
except (URLError, HTTPError, ContentTooShortError) as e:
print('Download error:', e.reason)
html = None
if num_retries > 0:
if hasattr(e, 'code') and 500 <= e.code < 600:
# recursively retry 5xx HTTP errors
return download(url, num_retries - 1)
return html
The current urllib module does not support https proxies by default (Python 3.5). This may change with future versions of Python, so check the latest documentation. Alternatively, you can use the documentation's recommended recipe (https://code.activestate.com/recipes/456195/) or keep reading to learn how to use the requests library.
- Git Version Control Cookbook
- 深入淺出Electron:原理、工程與實踐
- Effective C#:改善C#代碼的50個有效方法(原書第3版)
- Visual Basic編程:從基礎到實踐(第2版)
- 深入淺出Java虛擬機:JVM原理與實戰
- 架構不再難(全5冊)
- Visual Basic程序設計教程
- Mastering C# Concurrency
- 微信公眾平臺開發:從零基礎到ThinkPHP5高性能框架實踐
- PLC編程與調試技術(松下系列)
- HDInsight Essentials(Second Edition)
- Visual C++開發入行真功夫
- QPanda量子計算編程
- PyQt編程快速上手
- 現代C:概念剖析和編程實踐