Although search engine robot may visit home page of a site, it will not necessarily crawl all pages or assign them equal weight in terms of PageRank or relevance. So when auditing sites as part of an SEO initiative, SEO agencies will check how many pages are included within the search engine index for different search engines. This is known as index inclusion.
Technical reasons why search robots do not crawl all the pages, such as the use of SEO-unfriendly content management system with complex URLs.
Pages identified as spam or of less importance or considered to be duplicate content which are then contained in what used to be known as supplemental index in Google which don’t rank so highly. In these cases it is sometimes best to use a specific ‘canonical’ meta tag which tells the search engine which the primary page is. If you are multinational company with different content sites for different countries, then it is challenging to deliver the relevant content for local audiences with use of regional domains tending to work best.
- Reviewing web analytics data which will show how the frequency with which the main search robots crawl a site.
- Using web analytics referrer information to find out which search engines a site’s visitors originate from, and the most popular pages.
- Checking the number of pages that have been successfully indexed on a site. For example in Google the search ‘inurl:www.mudu.io’ or ‘site:mudu.io’ lists all the pages of site indexed by Google and gives the total number in the top-right of SERPs.
Chaffey, D. and Ellis-Chadwick, F., 2012. Digital marketing: strategy, implementation and practice (Vol. 5). Harlow: Pearson.