舉報(bào)

會(huì)員
Go Web Scraping Quick Start Guide
Webscrapingistheprocessofextractinginformationfromthewebusingvarioustoolsthatperformscrapingandcrawling.Goisemergingasthelanguageofchoiceforscrapingusingavarietyoflibraries.Thisbookwillquicklyexplaintoyou,howtoscrapedatadatafromvariouswebsitesusingGolibrariessuchasCollyandGoquery.ThebookstartswithanintroductiontotheusecasesofbuildingawebscraperandthemainfeaturesoftheGoprogramminglanguage,alongwithsettingupaGoenvironment.ItthenmovesontoHTTPrequestsandresponsesandtalksabouthowGohandlesthem.Youwillalsolearnaboutanumberofbasicwebscrapingetiquettes.Youwillbetaughthowtonavigatethroughawebsite,usingabreadth-firstandthenadepth-firstsearch,aswellasfindandfollowlinks.Youwillgettoknowaboutthewaystotrackhistoryinordertoavoidloopsandtoprotectyourwebscraperusingproxies.FinallythebookwillcovertheGoconcurrencymodel,andhowtorunscrapersinparallel,alongwithlarge-scaledistributedwebscraping.
目錄(137章)
倒序
- coverpage
- Title Page
- Copyright and Credits
- Go Web Scraping Quick Start Guide
- About Packt
- Why subscribe?
- Packt.com
- Contributors
- About the author
- About the reviewer
- Packt is searching for authors like you
- Preface
- Who this book is for
- What this book covers
- To get the most out of this book
- Download the example code files
- Conventions used
- Get in touch
- Reviews
- Introducing Web Scraping and Go
- What is web scraping?
- Why do you need a web scraper?
- Search engines
- Price comparison
- Building datasets
- What is Go?
- Why is Go a good fit for web scraping?
- Go is fast
- Go is safe
- Go is simple
- How to set up a Go development environment
- Go language and tools
- Git
- Editor
- Summary
- The Request/Response Cycle
- What do HTTP requests look like?
- HTTP request methods
- HTTP headers
- Query parameters
- Request body
- What do HTTP responses look like?
- Status line
- Response headers
- Response body
- What are HTTP status codes?
- 100–199 range
- 200–299 range
- 300–399 range
- 400–499 range
- 500–599 range
- What do HTTP requests/responses look like in Go?
- A simple request example
- Summary
- Web Scraping Etiquette
- What is a robots.txt file?
- What is a User-Agent string?
- Example
- How to throttle your scraper
- How to use caching
- Cache-Control
- Expires
- Etag
- Caching content in Go
- Summary
- Parsing HTML
- What is the HTML format?
- Syntax
- Structure
- Searching using the strings package
- Example – Counting links
- Example – Doctype check
- Searching using the regexp package
- Example – Finding links
- Example – Finding prices
- Searching using XPath queries
- Example – Daily deals
- Example – Collecting products
- Searching using Cascading Style Sheets selectors
- Example – Daily deals
- Example – Collecting products
- Summary
- Web Scraping Navigation
- Following links
- Example – Daily deals
- Submitting forms
- Example – Submitting searches
- Example – POST method
- Avoiding loops
- Breadth-first versus depth-first crawling
- Depth-first
- Breadth-first
- Navigating with JavaScript
- Example – Book reviews
- Summary
- Protecting Your Web Scraper
- Virtual private servers
- Proxies
- Public and shared proxies
- Dedicated proxies
- Price
- Location
- Type
- Anonymity
- Proxies in Go
- Virtual private networks
- Boundaries
- Whitelists
- Blacklists
- Summary
- Scraping with Concurrency
- What is concurrency
- Concurrency pitfalls
- Race conditions
- Deadlocks
- The Go concurrency model
- Goroutines
- Channels
- sync package helpers
- Conditions
- Atomic counters
- Summary
- Scraping at 100x
- Components of a web scraping system
- Queue
- Cache
- Storage
- Logs
- Scraping HTML pages with colly
- Scraping JavaScript pages with chrome-protocol
- Example – Amazon Daily Deals
- Distributed scraping with dataflowkit
- The Fetch service
- The Parse service
- Summary
- Other Books You May Enjoy
- Leave a review - let other readers know what you think 更新時(shí)間:2021-07-02 13:58:34
推薦閱讀
- Aptana Studio Beginner's Guide
- 電子政務(wù)效益的經(jīng)濟(jì)分析與評(píng)價(jià)
- 重新定義Spring Cloud實(shí)戰(zhàn)
- 互聯(lián)網(wǎng)安全的40個(gè)智慧洞見(jiàn):2015年中國(guó)互聯(lián)網(wǎng)安全大會(huì)文集
- 區(qū)塊鏈輕松上手:原理、源碼、搭建與應(yīng)用
- WordPress Web Application Development
- Mastering Dart
- Bonita Open Solution 5.x Essentials
- Windows Server 2012 Hyper-V虛擬化管理實(shí)踐
- 從實(shí)踐中學(xué)習(xí)手機(jī)抓包與數(shù)據(jù)分析
- 網(wǎng)絡(luò)空間全球治理觀察
- 數(shù)據(jù)血緣分析原理與實(shí)踐
- bash網(wǎng)絡(luò)安全運(yùn)維
- RestKit for iOS
- 移動(dòng)互聯(lián)網(wǎng)環(huán)境下的核心網(wǎng)剖析及演進(jìn)
- OpenShift Cookbook
- 網(wǎng)絡(luò)信息編輯項(xiàng)目化實(shí)操教程(第2版)
- 當(dāng)大數(shù)據(jù)遇見(jiàn)物聯(lián)網(wǎng):智能決策解決之道
- 5G低功耗蜂窩物聯(lián)網(wǎng)開(kāi)發(fā)與應(yīng)用
- 一板成功:高速電路研發(fā)與設(shè)計(jì)典型故障案例解析
- M262物聯(lián)網(wǎng)控制器應(yīng)用技術(shù)
- 深度實(shí)踐OCR:基于深度學(xué)習(xí)的文字識(shí)別
- 計(jì)算機(jī)網(wǎng)絡(luò)基礎(chǔ)(第4版)
- Moodle for Mobile Learning
- Building Websites with OpenCms
- 內(nèi)容分發(fā)網(wǎng)絡(luò)(CDN)關(guān)鍵技術(shù)、架構(gòu)與應(yīng)用
- Python Programming Blueprints
- Oracle APEX Cookbook(Second Edition)
- Progressive Web Application Development by Example
- Real-Time 3D Graphics with WebGL 2