官术网_书友最值得收藏!

Preparing the index

A search engine does not store your web pages, it stores an index of your web pages. For your page to appear in a search engine's index, first that search engine sends a search spider to visit your site and read your web pages' content. The spider returns the information to a document processor that processes your web pages into a format that the query processor understands. The document processor performs several formatting tasks—it might remove stop words, lower-value terms that bear little relation to the page's topic, such as the, and, it, and many more. The document processor will also perform term stemming, where suffixes like -ing, -er, -es, and -ed are stripped from search terms. In essence, a document processor trims the content to reveal the contextual elements of a web page and prepares the entry for indexing.

The index contains much of the information from your pages, along with other data that the search engine uses to evaluate and categorize your pages. As a highly simplified example, Google's index of your page will contain the text of your page on a date in the recent past when its spider last visited along with other data which are as follows:

  • A table of terms in order of the frequency in which they appear on your page (called the inverted file)
  • The page's PageRank
  • A term weight assignment: a numerical value that reflects the frequency of appearance of particular terms on a page
  • The page's meta tags
  • The page's destination URL

This description is grossly simplified, but illustrates that what the search engine attempts to match is not your page itself, but a processed and analyzed version of your page.

主站蜘蛛池模板: 靖远县| 突泉县| 平乡县| 泰来县| 泰顺县| 宾阳县| 江陵县| 怀化市| 宁陵县| 鹤岗市| 原平市| 平泉县| 特克斯县| 科尔| 江津市| 苍南县| 大埔区| 隆昌县| 邓州市| 龙山县| 法库县| 洛南县| 红安县| 高尔夫| 泸水县| 岳阳县| 土默特左旗| 镇巴县| 扶余县| 崇仁县| 元氏县| 虞城县| 永德县| 斗六市| 青海省| 沾化县| 昭平县| 韩城市| 时尚| 五大连池市| 天全县|