官术网_书友最值得收藏!

How it works...

Basically, all the inputs will be given to the Jsoup class to parse.

For an HTML string, you just need to pass the HTML string as parameter for the method Jsoup.parse().

For an HTML file, there are three parameters inputted for Jsoup.parse(). The first one is the file object, which points to the specified HTML file; the second one is the character set of the file. There is an overload of this method with an additional third parameter, Jsoup.parse(File file, String charsetName, String baseUri). The baseUri URL is the URL from where the HTML file is retrieved; it is used to resolve relative paths or links.

For a URL, you need to use the Jsoup.connect() method. Once the connection succeeds, it will return an object, thus implementing the connection interface. Through this, you can easily get the content of the URL page using the Connection.get() method.

The previous example is pretty easy and straightforward. The results of parsing from the Jsoup class will return a Document object, which represents a DOM structure of an HTML page, where the root node starts from <html>.

主站蜘蛛池模板: 龙山县| 安国市| 兰西县| 海原县| 青龙| 黄平县| 嘉禾县| 新宁县| 珲春市| 额敏县| 墨脱县| 都江堰市| 济源市| 依安县| 通道| 徐闻县| 巩义市| 称多县| 鄄城县| 屯昌县| 郓城县| 昆明市| 恭城| 双鸭山市| 吴桥县| 鲁山县| 大渡口区| 乌鲁木齐市| 锦屏县| 永清县| 南宫市| 黄山市| 上杭县| 忻城县| 宜宾县| 荔浦县| 腾冲县| 怀化市| 枝江市| 客服| 青河县|