- Python Web Scraping Cookbook
- Michael Heydt
- 193字
- 2021-06-30 18:43:58
Scraping Python.org with Scrapy
Scrapy is a very popular open source Python scraping framework for extracting data. It was originally designed for only scraping, but it is has also evolved into a powerful web crawling solution.
In our previous recipes, we used Requests and urllib2 to fetch data and Beautiful Soup to extract data. Scrapy offers all of these functionalities with many other built-in modules and extensions. It is also our tool of choice when it comes to scraping with Python.
Scrapy offers a number of powerful features that are worth mentioning:
- Built-in extensions to make HTTP requests and handle compression, authentication, caching, manipulate user-agents, and HTTP headers
- Built-in support for selecting and extracting data with selector languages such as CSS and XPath, as well as support for utilizing regular expressions for selection of content and links
- Encoding support to deal with languages and non-standard encoding declarations
- Flexible APIs to reuse and write custom middleware and pipelines, which provide a clean and easy way to implement tasks such as automatically downloading assets (for example, images or media) and storing data in storage such as file systems, S3, databases, and others
推薦閱讀
- CorelDRAW X6 中文版圖形設計實戰從入門到精通
- 物聯網(IoT)基礎:網絡技術+協議+用例
- 局域網組建、管理與維護項目教程(Windows Server 2003)
- WordPress 5 Complete
- Spring Cloud微服務架構進階
- PLC、現場總線及工業網絡實用技術速成
- Master Apache JMeter:From Load Testing to DevOps
- 端到端QoS網絡設計
- 從實踐中學習手機抓包與數據分析
- 5G技術核心與增強:從R15到R17
- 小型局域網組建
- 黑客與反黑工具使用詳解
- 物聯網技術與實踐
- 趣話通信:6G的前世、今生和未來
- INSTANT Social Media Marketing with HootSuite