Behind search engine

Improving positions in the natural listings is dependent on marketers understanding the process whereby search engines compile an index by sending out spiders or robots to crawl around sites that are registered with that search engine. Technology harnessed to create the natural listings involves these main processes:

Crawling

The purpose of crawl is to identify relevant pages for indexing and assess whether they have changed. Crawling is performed by robots (bots) that are also known as spiders. These access web pages and retrieve a reference URL of the page for later analysis and indexing.

Although the term ‘bot’ and ‘spider’ give the impression of something physical visiting a site, the bots are simply software process running on a search engine’s server which request pages, follow the links contained on that page and create a series of page references with associated URLs. This is a recursive process, so each link followed will find additional links which then need to be crawled.

Indexing

An index is created to enable the search engine to rapidly find the most relevant pages containing the query typed by the searcher. Rather than searching each page for a query phrase, a search engine ‘inverts’ the index to produce a lookup table of documents containing particular words.

The index information consists of phrases stored within a document and also other information characterising a page such as the document’s title, meta description, PageRank, trust and authority, spam rating etc. For the keywords in the document, additional attributes will be stored such as semantic markup (H1, H2 headings, denoted within HTML), occurence in link anchor text, proximity, frequency or density and position in document etc. The words contained in link anchor text ‘pointing’ to a page are particularly important in determining search rankings.

Ranking or scoring

The indexing process has produced a lookup of all pages that contain particular words in a query, but they also are not sorted in terms of relevance. Ranking of the document to access the most relevant set of documents to return in the SERPs occurs in real times for search query entered. First, relevant documents will be retrieved from a runtime version of the index at a particular data centre, then a rank in the SERPs for each document will be computed based on many ranking factors.

Query request and result serving

The familiar search engine interface accepts the searcher’s query. The user’s location is assessed through their IP address and the query is then passed to a relevant data centre for processing. Ranking then occurs in real time for a particular query to return a sorted list of relevant documents and these are then displayed on the search results page.

Adapted from

Chaffey, D. and Ellis-Chadwick, F., 2012. Digital marketing: strategy, implementation and practice (Vol. 5). Harlow: Pearson.

Next doc