- Python Web Scraping Cookbook
- Michael Heydt
- 235字
- 2021-06-30 18:44:05
Working with CSV and JSON data
Extracting data from HTML pages is done using the techniques in the previous chapter, primarily using XPath through various tools and also with Beautiful Soup. While we will focus primarily on HTML, HTML is a variant of XML (eXtensible Markup Language). XML one was the most popular for of expressing data on the web, but other have become popular, and even exceeded XML in popularity.
Two common formats that you will see are JSON (JavaScript Object Notation) and CSV (Comma Separated Values). CSV is easy to create and a common form for many spreadsheet applications, so many web sites provide data in that for, or you will need to convert scraped data to that format for further storage or collaboration. JSON really has become the preferred format, due to its easy within programming languages such as JavaScript (and Python), and many database now support it as a native data format.
In this recipe let's examine converting scraped data to CSV and JSON, as well as writing the data to files and also reading those data files from remote servers. The tools we will examine are the Python CSV and JSON libraries. We will also examine using pandas for these techniques.
Also implicit in these examples is the conversion of XML data to CSV and JSON, so we won't have a dedicated section for those examples.
- Mastering Node.js(Second Edition)
- MERN Quick Start Guide
- Hands-On Chatbots and Conversational UI Development
- 物聯網關鍵技術及應用
- Drush User’s Guide
- 無人機通信
- Spring Cloud微服務架構進階
- 大話社交網絡
- Spring 5.0 Projects
- Master Apache JMeter:From Load Testing to DevOps
- 圖神經網絡前沿
- Building RESTful Web Services with .NET Core
- 商業的本質和互聯網
- 圖解物聯網
- 移動互聯網環境下的核心網剖析及演進