The Inner Workings of Search Engines

Architecture of a Web crawler.
Image via Wikipedia

A search engine works by using web crawling, indexing and searching; in that order. They process by keeping data about web pages that they retrieve from the web. These individual pages are received by a web crawler, or spider, which is an automatic web browser that follows every link it picks up. Users can create exclusions by making a robots.txt file. After the content is retrieved, the pages are analyzed and it is decided how they will be indexed through key words in the title, special fields or headings. Some of these fields can include meta tags.

The data is then kept in an index database. Search engines will keep all or part of the cache and information about the web page. Google for instance stores all of the information and other search engines keep every word. This source page has the search text indexed and will be essential when the page is updated.

When a user utilizes a search engine by key word, the search engine finds the index and provides the person searching a list a web pages that best match their query. The search findings can include a few lines of text, a summary and the title of the web page. Users can usually use boolean terms that most search engines will recognize.

How the search engine is most useful depends on the results the user receives and how relevant these results are. There are millions of web sites with certain words that the user may search but the results are based on popular sites and relevant terms. Many web sites will pay for advertising on search engines so a user may see those results first. These are listings that will rank higher in the search engines because they have sponsored the search with certain key words and Search Engine Optimization (SEO).

Enhanced by Zemanta

Speak Your Mind

*


*