官术网_书友最值得收藏!

Search engines

One well-known use case for web scraping is indexing websites for the purpose of building a search engine. In this case, a web scraper would visit different websites and follow references to other websites in order to discover all of the content available on the internet. By collecting some of the content from the pages, you could respond to search queries by matching the terms to the contents of the pages you have collected. You could also suggest similar pages if you track how pages are linked together, and rank the most important pages by the number of connections they have to other sites.

Googlebot is the most famous example of a web scraper used to build a search engine. It is the first step in building the search engine as it downloads, indexes, and ranks each page on a website. It will also follow links to other websites, which is how it is able to index a substantial portion of the internet. According to Googlebot's documentation, the scraper attempts to reach each web page every few seconds, which requires them to reach estimates of well into billions of pages per day!

If your goal is to build a search engine, albeit on a much smaller scale, you will find enough tools in this book to collect the information you need. This book will not, however, cover indexing and ranking pages to provide relevant search results.

主站蜘蛛池模板: 肇庆市| 西乡县| 玛沁县| 藁城市| 平阴县| 蒙城县| 湖口县| 绥棱县| 定州市| 平塘县| 大丰市| 保定市| 梅州市| 雷州市| 东光县| 陈巴尔虎旗| 杭锦后旗| 梁平县| 五台县| 百色市| 阿克苏市| 赫章县| 隆昌县| 门头沟区| 大冶市| 葵青区| 焦作市| 萨嘎县| 安新县| 嘉荫县| 本溪市| 冷水江市| 留坝县| 日土县| 元谋县| 肥乡县| 晋城| 洛浦县| 甘德县| 洛南县| 盐池县|