官术网_书友最值得收藏!

How it works

XPath is a element of the XSLT (eXtensible Stylesheet Language Transformation) standard and provides the ability to select nodes in an XML document. HTML is a variant of XML, and hence XPath can work on on HTML document (although HTML can be improperly formed and mess up XPath parsing in those cases).

XPath itself is designed to model the structure of XML nodes, attributes, and properties. The syntax provides means of finding items in the XML that match the expression. This can include matching or logical comparison of any of the nodes, attributes, values, or text in the XML document.

XPath expressions can be combined to form very complex paths within the document. It is also possible to navigate the document based upon relative positions, which helps greatly in finding data based upon relative positions instead of absolute positions within the DOM.

Understanding XPath is essential for knowing how to parse HTML and perform web scraping. And as we will see, it underlies, and provides an implementation for, many of the higher level libraries such as lxml.

主站蜘蛛池模板: 肥东县| 阜阳市| 大城县| 武功县| 文成县| 贵德县| 遂昌县| 瓦房店市| 鹤岗市| 霍城县| 靖安县| 武清区| 霍州市| 金阳县| 卢龙县| 涿鹿县| 西盟| 平乡县| 调兵山市| 张家川| 梅河口市| 青铜峡市| 宜黄县| 城市| 吉木乃县| 武义县| 周至县| 南郑县| 沅陵县| 陕西省| 辽源市| 孟连| 铜鼓县| 正定县| 灵武市| 大安市| 辉县市| 鲁甸县| 贵州省| 彭水| 旅游|