官术网_书友最值得收藏!

Understanding search engine crawlers

Did you ever wonder how all those pages got into the search engines in the first place? There's a magic search engine genie that flies from server to server waving a magic wand; not really but close. Actually, there is a computer program called a crawler (or sometimes a spider or robot or'bot) that lives on the search engine's servers. Its job is to surf the Web and save anything it finds. It starts by visiting sites that it already knows about and after that, follows any links that it finds along the way. At each site that it visits, it grabs all the HTML code from every single page it can find and saves that on its own servers.

Later, an indexing server will take that HTML code, examine it, parse it, filter it, analyze it, and some other secret stuff (a lot like waving that magic wand). Finally, your site is saved into the search engine's index. Now, it's finally ready to be served up as a search result. Total time elapsed? About two minutes.

One important thing to note here is that search engine crawlers follow the same links that you do. That means that if you can't click the link, then there's a good chance that the crawler can't click the link either. Fortunately Google does a great job of following JavaScript links, but if you're using JavaScript for your Drupal navigation menus then chances are good that other search engines can't see much past your front page. That's where some creative techniques can really come in handy. Breadcrumbs to show navigation or an XML sitemap (refer to Chapter 5, Sitemaps) can help the crawler find out where to go next. That's why those tools are sometimes called spider food.

主站蜘蛛池模板: 安乡县| 靖宇县| 陇川县| 唐山市| 三江| 广州市| 临汾市| 东阳市| 黔江区| 潼关县| 绥中县| 漠河县| 嘉义市| 泌阳县| 会东县| 中牟县| 昭平县| 木兰县| 兴和县| 正宁县| 阜新市| 庄浪县| 平昌县| 浙江省| 墨竹工卡县| 咸丰县| 成武县| 平山县| 石河子市| 双城市| 磐石市| 洛隆县| 泾源县| 农安县| 奉新县| 宜昌市| 宁阳县| 疏勒县| 怀来县| 昭平县| 湖州市|