舉報

會員
Go Web Scraping Quick Start Guide
Webscrapingistheprocessofextractinginformationfromthewebusingvarioustoolsthatperformscrapingandcrawling.Goisemergingasthelanguageofchoiceforscrapingusingavarietyoflibraries.Thisbookwillquicklyexplaintoyou,howtoscrapedatadatafromvariouswebsitesusingGolibrariessuchasCollyandGoquery.ThebookstartswithanintroductiontotheusecasesofbuildingawebscraperandthemainfeaturesoftheGoprogramminglanguage,alongwithsettingupaGoenvironment.ItthenmovesontoHTTPrequestsandresponsesandtalksabouthowGohandlesthem.Youwillalsolearnaboutanumberofbasicwebscrapingetiquettes.Youwillbetaughthowtonavigatethroughawebsite,usingabreadth-firstandthenadepth-firstsearch,aswellasfindandfollowlinks.Youwillgettoknowaboutthewaystotrackhistoryinordertoavoidloopsandtoprotectyourwebscraperusingproxies.FinallythebookwillcovertheGoconcurrencymodel,andhowtorunscrapersinparallel,alongwithlarge-scaledistributedwebscraping.
目錄(137章)
倒序
- coverpage
- Title Page
- Copyright and Credits
- Go Web Scraping Quick Start Guide
- About Packt
- Why subscribe?
- Packt.com
- Contributors
- About the author
- About the reviewer
- Packt is searching for authors like you
- Preface
- Who this book is for
- What this book covers
- To get the most out of this book
- Download the example code files
- Conventions used
- Get in touch
- Reviews
- Introducing Web Scraping and Go
- What is web scraping?
- Why do you need a web scraper?
- Search engines
- Price comparison
- Building datasets
- What is Go?
- Why is Go a good fit for web scraping?
- Go is fast
- Go is safe
- Go is simple
- How to set up a Go development environment
- Go language and tools
- Git
- Editor
- Summary
- The Request/Response Cycle
- What do HTTP requests look like?
- HTTP request methods
- HTTP headers
- Query parameters
- Request body
- What do HTTP responses look like?
- Status line
- Response headers
- Response body
- What are HTTP status codes?
- 100–199 range
- 200–299 range
- 300–399 range
- 400–499 range
- 500–599 range
- What do HTTP requests/responses look like in Go?
- A simple request example
- Summary
- Web Scraping Etiquette
- What is a robots.txt file?
- What is a User-Agent string?
- Example
- How to throttle your scraper
- How to use caching
- Cache-Control
- Expires
- Etag
- Caching content in Go
- Summary
- Parsing HTML
- What is the HTML format?
- Syntax
- Structure
- Searching using the strings package
- Example – Counting links
- Example – Doctype check
- Searching using the regexp package
- Example – Finding links
- Example – Finding prices
- Searching using XPath queries
- Example – Daily deals
- Example – Collecting products
- Searching using Cascading Style Sheets selectors
- Example – Daily deals
- Example – Collecting products
- Summary
- Web Scraping Navigation
- Following links
- Example – Daily deals
- Submitting forms
- Example – Submitting searches
- Example – POST method
- Avoiding loops
- Breadth-first versus depth-first crawling
- Depth-first
- Breadth-first
- Navigating with JavaScript
- Example – Book reviews
- Summary
- Protecting Your Web Scraper
- Virtual private servers
- Proxies
- Public and shared proxies
- Dedicated proxies
- Price
- Location
- Type
- Anonymity
- Proxies in Go
- Virtual private networks
- Boundaries
- Whitelists
- Blacklists
- Summary
- Scraping with Concurrency
- What is concurrency
- Concurrency pitfalls
- Race conditions
- Deadlocks
- The Go concurrency model
- Goroutines
- Channels
- sync package helpers
- Conditions
- Atomic counters
- Summary
- Scraping at 100x
- Components of a web scraping system
- Queue
- Cache
- Storage
- Logs
- Scraping HTML pages with colly
- Scraping JavaScript pages with chrome-protocol
- Example – Amazon Daily Deals
- Distributed scraping with dataflowkit
- The Fetch service
- The Parse service
- Summary
- Other Books You May Enjoy
- Leave a review - let other readers know what you think 更新時間:2021-07-02 13:58:34
推薦閱讀
- Cisco OSPF命令與配置手冊
- Building Django 2.0 Web Applications
- 物聯(lián)網(wǎng)網(wǎng)絡(luò)安全及應(yīng)用
- 重新定義Spring Cloud實戰(zhàn)
- Oracle SOA Suite 11g Performance Tuning Cookbook
- 局域網(wǎng)組建、管理與維護(hù)項目教程(Windows Server 2003)
- SD-WAN架構(gòu)與技術(shù)(第2版)
- 企業(yè)私有云建設(shè)指南
- 中國互聯(lián)網(wǎng)發(fā)展報告2018
- 2小時讀懂物聯(lián)網(wǎng)
- 局域網(wǎng)組成實踐
- INSTANT LinkedIn Customization How-to
- 圖解物聯(lián)網(wǎng)
- 工業(yè)以太網(wǎng)技術(shù):AFDX/TTE網(wǎng)絡(luò)原理、接口、互連與安全
- 世界互聯(lián)網(wǎng)發(fā)展報告2021
- 趣話通信:6G的前世、今生和未來
- 智能家庭網(wǎng)絡(luò):技術(shù)、標(biāo)準(zhǔn)與應(yīng)用實踐
- 計算機(jī)網(wǎng)絡(luò)(項目教學(xué)版)
- 網(wǎng)絡(luò)故障現(xiàn)場處理實踐(第2版)
- M262物聯(lián)網(wǎng)控制器應(yīng)用技術(shù)
- iOS 12 Programming for Beginners
- WLAN技術(shù)問答
- TD-SCDMA網(wǎng)絡(luò)部署、運營與優(yōu)化實踐
- 新編計算機(jī)網(wǎng)絡(luò)
- 網(wǎng)絡(luò)知識與應(yīng)用
- 增長密碼:大型網(wǎng)站百萬流量運營之道
- 網(wǎng)絡(luò)綜合布線技術(shù)(Vcom)(第2版)
- 路由交換技術(shù)及應(yīng)用(第4版)
- Learn Web Development with Python
- Hands-On Server-Side Web Development with Swift