官术网_书友最值得收藏!

Scraping versus crawling

Depending on the information you are after and the site content and structure, you may need to either build a web scraper or a website crawler. What is the difference?

A web scraper is usually built to target a particular website or sites and to garner specific information on those sites. A web scraper is built to access these specific pages and will need to be modified if the site changes or if the information location on the site is changed. For example, you might want to build a web scraper to check the daily specials at your favorite local restaurant, and to do so you would scrape the part of their site where they regularly update that information. 

In contrast, a web crawler is usually built in a generic way; targeting either websites from a series of top-level domains or for the entire web. Crawlers can be built to gather more specific information, but are usually used to crawl the web, picking up small and generic bits of information from many different sites or pages and following links to other pages.

In addition to crawlers and scrapers, we will also cover web spiders in Chapter 8Scrapy. Spiders can be used for crawling a specific set of sites or for broader crawls across many sites or even the Internet.

Generally, we will use specific terms to reflect our use cases; as you develop your web scraping, you may notice distinctions in technologies, libraries, and packages you may want to use. In these cases, your knowledge of the differences in these terms will help you select an appropriate package or technology based on the terminology used (such as, is it only for scraping? Is it also for spiders?).

主站蜘蛛池模板: 黑龙江省| 观塘区| 博白县| 镇宁| 神木县| 赣州市| 金溪县| 闽清县| 开鲁县| 长岭县| 金乡县| 永州市| 从化市| 常德市| 咸宁市| 长汀县| 香港 | 枣阳市| 凤冈县| 新营市| 河津市| 志丹县| 襄樊市| 泽州县| 那坡县| 平阴县| 方正县| 云梦县| 岢岚县| 大姚县| 崇仁县| 南涧| 刚察县| 平顺县| 辉县市| 玉龙| 闻喜县| 新密市| 云龙县| 汉沽区| 长汀县|