If you have a larger than usual website and you want to improve your traffic with SEO, then crawl optimisation should always be your first priority. Once you start monitoring and understanding how Googlebot is crawling your site, you will begin to gain more insight into how and where improvements can be made.
The importance of crawl optimisation began back in 2010, when Google introduced us to the caffeine update. This update was a major game-changer for how Google handled and indexed content across the web. Faster crawling and indexing was essential for Google in order to be able to provide searchers answers to queries in real-time.
Since the Caffeine update, Google is able to index a web page as soon as they find it, giving priority to pages deemed to be important and ensuring they remain fresh in the index.
In order to achieve this, each website is allocated what is known as a “crawl budget”, which is the number of pages that Googlebot will crawl each time it visits your website. Your crawl budget is determined by how important Google deems your website to be. Crawl budget can be impacted by things such as host load and duplicate content. For instance if Googlebot has crawled three pages from your website but two of the pages are a duplicate of the first, it may drop the two duplicates allowing just the original in the index. This is also a signal to Google that the website doesn’t have great content and therefore it may not be crawled as frequently.
Crawl rate is also a factor some SEO’s believe to be used in determining how a page will rank. In other words, web pages crawled every two weeks, will generate more traffic than web pages crawled every month. Therefore increasing the crawl rate of your website will increase website traffic. Web pages that are not crawled very often are those with little to no authority, this is where crawl rate can make a significant difference to a large amount of pages.
Websites that tend to suffer from bad crawl optimisation; are large ecommerce websites that use pagination for products, sites with bad architecture or internal linking and sites using templates for content on their landing pages.
The easiest way to tell if your website is suffering from bad crawl optimisation is to check your pages in Google’s cache by typing “cache:” before the URL in the browser. You should see something like this:
From my own personal observations, if your web page hasn’t been crawled in the last 3 months, it will not show a cached version of the page in Google’s index and therefore will not rank well in Google search results, if at all.
You can make use of the “fetch as google” command in Webmaster Tools Search Console to ask googlebot to crawl any pages within your website that have fallen from Google’s index, although there is a monthly quota of 500 submissions, so this tool alone is not enough to solve any crawl optimisation problems.
In conclusion, if you believe that your website is suffering from bad crawl optimisation then there are several steps that you can take to influence your crawl rate, including ensuring you are using a decent hosting provider, blogging frequently, creating sitemaps and avoiding duplicate content. De-paginate your website so that your site architecture is as flat as possible and make sure that your internal linking structure is fully optimised as this is one of the primary ways of telling Google which pages are important to you. You can also make use of robots.txt and the parameters tool in Webmaster Tools Search Console, this will help to ensure Google isn’t wasting time crawling pages on your website that you don’t care about.