官术网_书友最值得收藏!

Working with CSV and JSON data

Extracting data from HTML pages is done using the techniques in the previous chapter, primarily using XPath through various tools and also with Beautiful Soup. While we will focus primarily on HTML, HTML is a variant of XML (eXtensible Markup Language).  XML one was the most popular for  of expressing data on the web, but other have become popular, and even exceeded XML in popularity. 

Two common formats that you will see are JSON (JavaScript Object Notation) and CSV (Comma Separated Values).  CSV is easy to create and a common form for many spreadsheet applications, so many web sites provide data in that for, or you will need to convert scraped data to that format for further storage or collaboration. JSON really has become the preferred format, due to its easy within programming languages such as JavaScript (and Python), and many database now support it as a native data format.

In this recipe let's examine converting scraped data to CSV and JSON, as well as writing the data to files and also reading those data files from remote servers. The tools we will examine are the Python CSV and JSON libraries. We will also examine using pandas for these techniques.


Also implicit in these examples is the conversion of XML data to CSV and JSON, so we won't have a dedicated section for those examples.
主站蜘蛛池模板: 南平市| 陕西省| 阜阳市| 怀柔区| 阿克苏市| 新疆| 呼伦贝尔市| 镇赉县| 沧源| 昭平县| 沙洋县| 和平区| 德安县| 临潭县| 龙南县| 长武县| 南召县| 石嘴山市| 盐城市| 九寨沟县| 长治县| 会昌县| 容城县| 黔江区| 买车| 启东市| 罗山县| 武定县| 桃园县| 普兰店市| 武川县| 额济纳旗| 白山市| 红桥区| 丁青县| 宁南县| 郧西县| 乌兰浩特市| 水富县| 石阡县| 偃师市|