官术网_书友最值得收藏!

Introduction

The key aspects for effective scraping are understanding how content and data are stored on web servers, identifying the data you want to retrieve, and understanding how the tools support this extraction. In this chapter, we will discuss website structures and the DOM, introduce techniques to parse, and query websites with lxml, XPath, and CSS. We will also look at how to work with websites developed in other languages and different encoding types such as Unicode.

Ultimately, understanding how to find and extract data within an HTML document comes down to understanding the structure of the HTML page, its representation in the DOM, the process of querying the DOM for specific elements, and how to specify which elements you want to retrieve based upon how the data is represented.

主站蜘蛛池模板: 儋州市| 景洪市| 庄浪县| 买车| 抚松县| 丰镇市| 阿拉尔市| 车致| 广宗县| 桦川县| 兰西县| 屏东县| 静宁县| 巫溪县| 都昌县| 广平县| 东至县| 兴和县| 巴马| 张家界市| 天祝| 凤凰县| 平遥县| 隆尧县| 汝州市| 肃北| 合肥市| 习水县| 黑河市| 郧西县| 石泉县| 棋牌| 蚌埠市| 泗水县| 西乡县| 庆阳市| 林口县| 莱州市| 那曲县| 寿阳县| 濉溪县|