官术网_书友最值得收藏!

Adding helper methods

To start with, we need to import some packages:

import re

from weatherterm.core import Forecast
from weatherterm.core import Request
from weatherterm.core import Unit
from weatherterm.core import UnitConverter

And in the initializer, we are going to add the following code:

self._base_url = 'http://weather.com/weather/{forecast}/l/{area}'
self._request = Request(self._base_url)

self._temp_regex = re.compile('([0-9]+)\D{,2}([0-9]+)')
self._only_digits_regex = re.compile('[0-9]+')

self._unit_converter = UnitConverter(Unit.FAHRENHEIT)

In the initializer, we define the URL template we are going to use to perform requests to the weather website; then, we create a Request object. This is the object that will perform the requests for us.

Regular expressions are only used when parsing today's weather forecast temperatures.

We also define a UnitConverter object and set the default unit to Fahrenheit.

Now, we are ready to start adding two methods that will be responsible for actually searching for HTML elements within a certain class and return its contents. The first method is called _get_data:

def _get_data(self, container, search_items):
scraped_data = {}

for key, value in search_items.items():
result = container.find(value, class_=key)

data = None if result is None else result.get_text()

if data is not None:
scraped_data[key] = data

return scraped_data

The idea of this method is to search items within a container that matches some criteria. The container is just a DOM element in the HTML and the search_items is a dictionary where the key is a CSS class and the value is the type of the HTML element. It can be a DIV, SPAN, or anything that you wish to get the value from.

It starts looping through search_items.items() and uses the find method to find the element within the container. If the item is found, we use get_text to extract the text of the DOM element and add it to a dictionary that will be returned when there are no more items to search.

The second method that we will implement is the _parser method. This will make use of the _get_data that we just implemented:

def _parse(self, container, criteria):
results = [self._get_data(item, criteria)
for item in container.children]

return [result for result in results if result]

Here, we also get a container and criteria like the _get_data method. The container is a DOM element and the criterion is a dictionary of nodes that we want to find. The first comprehension gets all the container's children elements and passes them to the _get_data method.

The results will be a list of dictionaries with all the items that have been found, and we will only return the dictionaries that are not empty.

There are only two more helper methods we need to implement in order to get today's weather forecast in place. Let's implement a method called _clear_str_number:

def _clear_str_number(self, str_number):
result = self._only_digits_regex.match(str_number)
return '--' if result is None else result.group()

This method will use a regular expression to make sure that only digits are returned.

And the last method that needs to be implemented is the _get_additional_info method:

def _get_additional_info(self, content):
data = tuple(item.td.span.get_text()
for item in content.table.tbody.children)
return data[:2]

This method loops through the table rows, getting the text of every cell. This comprehension will return lots of information about the weather, but we are only interested in the first 2, the wind and the humidity.

主站蜘蛛池模板: 偏关县| 嘉善县| 白玉县| 辰溪县| 永定县| 宜兰县| 长阳| 宾阳县| 凤阳县| 林甸县| 银川市| 雷波县| 萝北县| 册亨县| 抚州市| 高碑店市| 马关县| 五原县| 长岭县| 龙川县| 石泉县| 天台县| 德阳市| 镇坪县| 互助| 通化市| 阳西县| 保靖县| 武宣县| 仪陇县| 浏阳市| 浠水县| 江口县| 清水河县| 鄂温| 垣曲县| 河间市| 汽车| 镇雄县| 沅江市| 龙江县|