書名： Python Web Scraping Cookbook
作者名： Michael Heydt
本章字數(shù)： 111字
更新時間： 2021-06-30 18:44:12

How it works

In the constructor for URLUtility, there is a call to urlib.parse.urlparse. The following demonstrates using the function interactively:

>>> parsed = urlparse(const.ApodEclipseImage())
>>> parsed
ParseResult(scheme='https', netloc='apod.nasa.gov', path='/apod/image/1709/BT5643s.jpg', params='', query='', fragment='')

The ParseResult object contains the various components of the URL. The path element contains the path and the filename. The call to the .filename_without_ext property returns just the filename without the extension:

@property
def filename_without_ext(self):
    filename = os.path.splitext(os.path.basename(self._parsed.path))[0]
    return filename

The call to os.path.basename returns only the filename portion of the path (including the extension). os.path.splittext() then separates the filename and the extension, and the function returns the first element of that tuple/list (the filename).

官术网_书友最值得收藏!

Python Web Scraping Cookbook

How it works