- Python Web Scraping(Second Edition)
- Katharine Jarmul Richard Lawson
- 243字
- 2021-07-09 19:42:45
Downloading a web page
To scrape web pages, we first need to download them. Here is a simple Python script that uses Python's urllib module to download a URL:
import urllib.request
def download(url):
return urllib.request.urlopen(url).read()
When a URL is passed, this function will download the web page and return the HTML. The problem with this snippet is that, when downloading the web page, we might encounter errors that are beyond our control; for example, the requested page may no longer exist. In these cases, urllib will raise an exception and exit the script. To be safer, here is a more robust version to catch these exceptions:
import urllib.request
from urllib.error import URLError, HTTPError, ContentTooShortError
def download(url):
print('Downloading:', url)
try:
html = urllib.request.urlopen(url).read()
except (URLError, HTTPError, ContentTooShortError) as e:
print('Download error:', e.reason)
html = None
return html
Now, when a download or URL error is encountered, the exception is caught and the function returns None.
Throughout this book, we will assume you are creating files with code that is presented without prompts (like the code above). When you see code that begins with a Python prompt >>> or and IPython prompt In [1]:, you will need to either enter that into the main file you have been using, or save the file and import those functions and classes into your Python interpreter. If you run into any issues, please take a look at the code in the book repository at https://github.com/kjam/wswp.
推薦閱讀
- ASP.NET Core:Cloud-ready,Enterprise Web Application Development
- Mastering SVG
- Mastering Ubuntu Server
- Learning Python Design Patterns(Second Edition)
- Learning ArcGIS for Desktop
- Learning FuelPHP for Effective PHP Development
- Cocos2d-x Game Development Blueprints
- OpenCV 3 Blueprints
- 智能手機(jī)故障檢測(cè)與維修從入門到精通
- Getting Started with Python
- 計(jì)算機(jī)應(yīng)用基礎(chǔ)(第二版)
- Mastering Apache Camel
- After Effects CC案例設(shè)計(jì)與經(jīng)典插件(視頻教學(xué)版)
- 深入理解C++11:C++11新特性解析與應(yīng)用
- R for Data Science Cookbook