- Python Programming Blueprints
- Daniel Furtado Marcus Pennington
- 550字
- 2021-06-24 18:53:47
Adding helper methods
To start with, we need to import some packages:
import re
from weatherterm.core import Forecast
from weatherterm.core import Request
from weatherterm.core import Unit
from weatherterm.core import UnitConverter
And in the initializer, we are going to add the following code:
self._base_url = 'http://weather.com/weather/{forecast}/l/{area}'
self._request = Request(self._base_url)
self._temp_regex = re.compile('([0-9]+)\D{,2}([0-9]+)')
self._only_digits_regex = re.compile('[0-9]+')
self._unit_converter = UnitConverter(Unit.FAHRENHEIT)
In the initializer, we define the URL template we are going to use to perform requests to the weather website; then, we create a Request object. This is the object that will perform the requests for us.
Regular expressions are only used when parsing today's weather forecast temperatures.
We also define a UnitConverter object and set the default unit to Fahrenheit.
Now, we are ready to start adding two methods that will be responsible for actually searching for HTML elements within a certain class and return its contents. The first method is called _get_data:
def _get_data(self, container, search_items):
scraped_data = {}
for key, value in search_items.items():
result = container.find(value, class_=key)
data = None if result is None else result.get_text()
if data is not None:
scraped_data[key] = data
return scraped_data
The idea of this method is to search items within a container that matches some criteria. The container is just a DOM element in the HTML and the search_items is a dictionary where the key is a CSS class and the value is the type of the HTML element. It can be a DIV, SPAN, or anything that you wish to get the value from.
It starts looping through search_items.items() and uses the find method to find the element within the container. If the item is found, we use get_text to extract the text of the DOM element and add it to a dictionary that will be returned when there are no more items to search.
The second method that we will implement is the _parser method. This will make use of the _get_data that we just implemented:
def _parse(self, container, criteria):
results = [self._get_data(item, criteria)
for item in container.children]
return [result for result in results if result]
Here, we also get a container and criteria like the _get_data method. The container is a DOM element and the criterion is a dictionary of nodes that we want to find. The first comprehension gets all the container's children elements and passes them to the _get_data method.
The results will be a list of dictionaries with all the items that have been found, and we will only return the dictionaries that are not empty.
There are only two more helper methods we need to implement in order to get today's weather forecast in place. Let's implement a method called _clear_str_number:
def _clear_str_number(self, str_number):
result = self._only_digits_regex.match(str_number)
return '--' if result is None else result.group()
This method will use a regular expression to make sure that only digits are returned.
And the last method that needs to be implemented is the _get_additional_info method:
def _get_additional_info(self, content):
data = tuple(item.td.span.get_text()
for item in content.table.tbody.children)
return data[:2]
This method loops through the table rows, getting the text of every cell. This comprehension will return lots of information about the weather, but we are only interested in the first 2, the wind and the humidity.
- 解析QUIC/HTTP3:未來互聯(lián)網(wǎng)的基石
- 物聯(lián)網(wǎng)智慧安監(jiān)技術(shù)
- 網(wǎng)管員典藏書架:網(wǎng)絡(luò)管理與運維實戰(zhàn)寶典
- 局域網(wǎng)組建、管理與維護項目教程(Windows Server 2003)
- SSL VPN : Understanding, evaluating and planning secure, web/based remote access
- 電力物聯(lián)網(wǎng)工程技術(shù)原理與應(yīng)用
- 中國互聯(lián)網(wǎng)發(fā)展報告2018
- Microservice Patterns and Best Practices
- 網(wǎng)絡(luò)環(huán)境中基于用戶視角的信息質(zhì)量評價研究
- VMware NSX網(wǎng)絡(luò)虛擬化入門
- Practical Web Penetration Testing
- 網(wǎng)絡(luò)空間全球治理觀察
- 網(wǎng)絡(luò)安全之道
- Building RESTful Web Services with .NET Core
- 從物聯(lián)到萬聯(lián):Node.js與樹莓派萬維物聯(lián)網(wǎng)構(gòu)建實戰(zhàn)