- Python Web Scraping(Second Edition)
- Katharine Jarmul Richard Lawson
- 235字
- 2021-07-09 19:42:46
Supporting proxies
Sometimes it's necessary to access a website through a proxy. For example, Hulu is blocked in many countries outside the United States as are some videos on YouTube. Supporting proxies with urllib is not as easy as it could be. We will cover requests for a more user-friendly Python HTTP module that can also handle proxies later in this chapter. Here's how to support a proxy with urllib:
proxy = 'http://myproxy.net:1234' # example string
proxy_support = urllib.request.ProxyHandler({'http': proxy})
opener = urllib.request.build_opener(proxy_support)
urllib.request.install_opener(opener)
# now requests via urllib.request will be handled via proxy
Here is an updated version of the download function to integrate this:
def download(url, user_agent='wswp', num_retries=2, charset='utf-8', proxy=None):
print('Downloading:', url)
request = urllib.request.Request(url)
request.add_header('User-agent', user_agent)
try:
if proxy:
proxy_support = urllib.request.ProxyHandler({'http': proxy})
opener = urllib.request.build_opener(proxy_support)
urllib.request.install_opener(opener)
resp = urllib.request.urlopen(request)
cs = resp.headers.get_content_charset()
if not cs:
cs = charset
html = resp.read().decode(cs)
except (URLError, HTTPError, ContentTooShortError) as e:
print('Download error:', e.reason)
html = None
if num_retries > 0:
if hasattr(e, 'code') and 500 <= e.code < 600:
# recursively retry 5xx HTTP errors
return download(url, num_retries - 1)
return html
The current urllib module does not support https proxies by default (Python 3.5). This may change with future versions of Python, so check the latest documentation. Alternatively, you can use the documentation's recommended recipe (https://code.activestate.com/recipes/456195/) or keep reading to learn how to use the requests library.
- Hands-On Machine Learning with scikit:learn and Scientific Python Toolkits
- C語言程序設(shè)計案例教程(第2版)
- Building a Game with Unity and Blender
- Getting Started with ResearchKit
- OpenCV 3和Qt5計算機視覺應(yīng)用開發(fā)
- PostgreSQL Replication(Second Edition)
- Protocol-Oriented Programming with Swift
- Procedural Content Generation for C++ Game Development
- Scratch3.0趣味編程動手玩:比賽訓(xùn)練營
- Python語言科研繪圖與學(xué)術(shù)圖表繪制從入門到精通
- 后臺開發(fā):核心技術(shù)與應(yīng)用實踐
- Java EE 7 with GlassFish 4 Application Server
- Maven for Eclipse
- Distributed Computing with Python
- Visual C++程序開發(fā)范例寶典