官术网_书友最值得收藏!

Scraping versus crawling

Depending on the information you are after and the site content and structure, you may need to either build a web scraper or a website crawler. What is the difference?

A web scraper is usually built to target a particular website or sites and to garner specific information on those sites. A web scraper is built to access these specific pages and will need to be modified if the site changes or if the information location on the site is changed. For example, you might want to build a web scraper to check the daily specials at your favorite local restaurant, and to do so you would scrape the part of their site where they regularly update that information. 

In contrast, a web crawler is usually built in a generic way; targeting either websites from a series of top-level domains or for the entire web. Crawlers can be built to gather more specific information, but are usually used to crawl the web, picking up small and generic bits of information from many different sites or pages and following links to other pages.

In addition to crawlers and scrapers, we will also cover web spiders in Chapter 8Scrapy. Spiders can be used for crawling a specific set of sites or for broader crawls across many sites or even the Internet.

Generally, we will use specific terms to reflect our use cases; as you develop your web scraping, you may notice distinctions in technologies, libraries, and packages you may want to use. In these cases, your knowledge of the differences in these terms will help you select an appropriate package or technology based on the terminology used (such as, is it only for scraping? Is it also for spiders?).

主站蜘蛛池模板: 普兰店市| 辽宁省| 阳泉市| 鄱阳县| 宁夏| 潜江市| 平武县| 兴城市| 安徽省| 义乌市| 黎川县| 沅陵县| 麻江县| 墨江| 濉溪县| 淅川县| 湄潭县| 凤城市| 长阳| 桐庐县| 慈溪市| 河北区| 玉树县| 新竹县| 威海市| 津南区| 九江县| 德惠市| 南丹县| 兴和县| 定陶县| 临江市| 清流县| 元谋县| 凤阳县| 诏安县| 都江堰市| 阿克苏市| 达拉特旗| 尖扎县| 张家港市|