舉報

會員
Go Web Scraping Quick Start Guide
Webscrapingistheprocessofextractinginformationfromthewebusingvarioustoolsthatperformscrapingandcrawling.Goisemergingasthelanguageofchoiceforscrapingusingavarietyoflibraries.Thisbookwillquicklyexplaintoyou,howtoscrapedatadatafromvariouswebsitesusingGolibrariessuchasCollyandGoquery.ThebookstartswithanintroductiontotheusecasesofbuildingawebscraperandthemainfeaturesoftheGoprogramminglanguage,alongwithsettingupaGoenvironment.ItthenmovesontoHTTPrequestsandresponsesandtalksabouthowGohandlesthem.Youwillalsolearnaboutanumberofbasicwebscrapingetiquettes.Youwillbetaughthowtonavigatethroughawebsite,usingabreadth-firstandthenadepth-firstsearch,aswellasfindandfollowlinks.Youwillgettoknowaboutthewaystotrackhistoryinordertoavoidloopsandtoprotectyourwebscraperusingproxies.FinallythebookwillcovertheGoconcurrencymodel,andhowtorunscrapersinparallel,alongwithlarge-scaledistributedwebscraping.
目錄(137章)
倒序
- coverpage
- Title Page
- Copyright and Credits
- Go Web Scraping Quick Start Guide
- About Packt
- Why subscribe?
- Packt.com
- Contributors
- About the author
- About the reviewer
- Packt is searching for authors like you
- Preface
- Who this book is for
- What this book covers
- To get the most out of this book
- Download the example code files
- Conventions used
- Get in touch
- Reviews
- Introducing Web Scraping and Go
- What is web scraping?
- Why do you need a web scraper?
- Search engines
- Price comparison
- Building datasets
- What is Go?
- Why is Go a good fit for web scraping?
- Go is fast
- Go is safe
- Go is simple
- How to set up a Go development environment
- Go language and tools
- Git
- Editor
- Summary
- The Request/Response Cycle
- What do HTTP requests look like?
- HTTP request methods
- HTTP headers
- Query parameters
- Request body
- What do HTTP responses look like?
- Status line
- Response headers
- Response body
- What are HTTP status codes?
- 100–199 range
- 200–299 range
- 300–399 range
- 400–499 range
- 500–599 range
- What do HTTP requests/responses look like in Go?
- A simple request example
- Summary
- Web Scraping Etiquette
- What is a robots.txt file?
- What is a User-Agent string?
- Example
- How to throttle your scraper
- How to use caching
- Cache-Control
- Expires
- Etag
- Caching content in Go
- Summary
- Parsing HTML
- What is the HTML format?
- Syntax
- Structure
- Searching using the strings package
- Example – Counting links
- Example – Doctype check
- Searching using the regexp package
- Example – Finding links
- Example – Finding prices
- Searching using XPath queries
- Example – Daily deals
- Example – Collecting products
- Searching using Cascading Style Sheets selectors
- Example – Daily deals
- Example – Collecting products
- Summary
- Web Scraping Navigation
- Following links
- Example – Daily deals
- Submitting forms
- Example – Submitting searches
- Example – POST method
- Avoiding loops
- Breadth-first versus depth-first crawling
- Depth-first
- Breadth-first
- Navigating with JavaScript
- Example – Book reviews
- Summary
- Protecting Your Web Scraper
- Virtual private servers
- Proxies
- Public and shared proxies
- Dedicated proxies
- Price
- Location
- Type
- Anonymity
- Proxies in Go
- Virtual private networks
- Boundaries
- Whitelists
- Blacklists
- Summary
- Scraping with Concurrency
- What is concurrency
- Concurrency pitfalls
- Race conditions
- Deadlocks
- The Go concurrency model
- Goroutines
- Channels
- sync package helpers
- Conditions
- Atomic counters
- Summary
- Scraping at 100x
- Components of a web scraping system
- Queue
- Cache
- Storage
- Logs
- Scraping HTML pages with colly
- Scraping JavaScript pages with chrome-protocol
- Example – Amazon Daily Deals
- Distributed scraping with dataflowkit
- The Fetch service
- The Parse service
- Summary
- Other Books You May Enjoy
- Leave a review - let other readers know what you think 更新時間:2021-07-02 13:58:34
推薦閱讀
- 通信網絡基礎與設備
- 社交網絡對齊
- Application Development with Qt Creator(Second Edition)
- C++黑客編程揭秘與防范
- 物聯網短距離無線通信技術應用與開發
- Twilio Cookbook
- 重新定義Spring Cloud實戰
- 中小型局域網組建、管理與維護實戰
- 光纖通信系統與網絡(修訂版)
- SAE原理與網絡規劃
- 網絡利他行為研究:積極心理學的視角
- 移動互聯網新思維
- Intelligent Mobile Projects with TensorFlow
- 物聯網與智慧廣電
- 物聯網技術與實踐
- ElasticSearch Server
- 互聯網心理學:新心理與行為研究的興起
- Learning AWS(Second Edition)
- 網絡編碼應用
- WLAN技術問答
- Advanced Penetration Testing for Highly-Secured Environments:The Ultimate Security Guide
- 企業網絡安全建設最佳實踐
- Igor Pro實用教程:圖表繪制、數據分析與程序設計
- SpamAssassin: A practical guide to integration and configuration
- The Agile Developer's Handbook
- 下一代互聯網
- 計算機網絡原理與應用(第2版)
- Apache Solr 4 Cookbook
- 企業遷云實戰
- 系統化思維導論