- Instant Nokogiri
- Hunter Powers
- 482字
- 2021-04-10 00:04:05
So, what is Nokogiri?
Nokogiri (htpp://nokogiri.org/) is the most popular open source Ruby gem for HTML and XML parsing. It parses HTML and XML documents into node sets and allows for searching with CSS3 and XPath selectors. It may also be used to construct new HTML and XML objects.
The Nokogiri homepage is shown in the following screenshot:

Nokogiri is fast and efficient. It combines the raw power of the native C parser Libxml2 (http://www.xmlsoft.org/) with the intuitive parsing API of Hpricot (https://github.com/hpricot/hpricot).
The primary use case for a parsing library is data scraping. Data scraping is the process of extracting data intended for humans and structuring it for input into another program. Data by itself is meaningless without structure. Software imposes rigid structure over data referred to as format.
The same can be said of spoken language. We do not yell out random sounds and expect them to have meaning. We use words to form sentences to form meaning. This is our format. It is a loose structure. You could learn ten words in a foreign language, combine those with a few hand symbols, and add in a little amateur acting to convey fairly advanced concepts to people who don't speak your native tongue. This interpretive prowess is not shared by computers. Computer communication must follow protocols; fail to follow the protocol and no communication will be made.
The goal here is to bridge the two. Take the data intended for humans, get rid of the superfluous, and parse it into a structured data format for a computer. Data intended for humans is inherently fickle as the structure frequently changes. Data scraping should be used as a last effort and is generally appropriate in two scenarios: interfacing systems with incompatible data formats, and third-party sources lacking an API. If you aren't solving one of these two problems, you probably shouldn't be scraping.
An example of this is the most common scrape and parse use case in tutorials on the Internet: Amazon price searching. The scenario is: you have a database of products and you want up-to-date pricing information. The tutorials inevitably lead you through the process of scraping and parsing Amazon's search results to extract prices. The problem is, Amazon provides an API with all of this information and more on the Amazon Product Advertising API.
It is important to remember that you are using someone else's server resources when scraping. This is why the preferred method of accessing information should always be a developer approved API. An API in general will provide faster, cleaner, and more direct access to data while not expressing undue toll on the provider's servers.
A wealth of information sits waiting on the Internet. A small fraction is made easily accessible to developers via APIs. Nokogiri bridges that gap with its slick, fast, HTML and XML parsing engine bundled in an easy to use Ruby gem.
- Microsoft Forefront UAG 2010 Administrator's Handbook
- 剪映短視頻制作全流程:剪輯、調色、字幕、音效
- 3ds Max 2014標準教程(全視頻微課版)
- 邊做邊學:Photoshop圖像制作案例教程(Photoshop CC 2019·微課版)
- Adobe創意大學InDesign CS5 版式設計師標準實訓教材
- After Effects影視特效立體化教程:After Effects 2021(微課版)
- Joomla! 1.5 Site Blueprints: LITE
- Service Oriented Java Business Integration
- AutoCAD 2022中文版從入門到精通(標準版)
- 中文版3ds Max 2022基礎教程
- 邊做邊學:CorelDRAW X6圖形設計案例教程(第2版)(微課版)
- 中文版3ds Max 2016基礎培訓教程
- 老郵差 Photoshop數碼照片處理技法 圖層篇(修訂版)
- jQuery UI 1.6: The User Interface Library for jQuery
- AutoCAD 2016中文版自學視頻教程(標準版)