Crawl Budget – It’s really a simple SEO concept

The phrase “crawl budget” is an SEO term that is frequently included in discussions about technical SEO, but it is typically used incorrectly. Most of the time when people refer to crawl budget they are considering it a technical SEO enhancement to improve the way Google understands a website. In fact, it is far simpler than that it is simply a budget.

The best way of understanding various aspects of Google’s algorithms is to view them from a financial standpoint. Crawling and indexing the web is a very expensive proposition and Google was able to beat out every search engine to dominance because they figured out how to do that before the money ran out.

While it would be ideal for Google’s crawlers to simply gobble up the entire web in one fell swoop that would be technically impossible. Crawlers need to literally crawl through the web discovering link after link and then as they land on a page they need to build a copy of the page into the database.

It’s about the Benjamins!

In the early days of search, while Google was still living on venture capital money, the engineers needed to come up with a way to efficiently crawl the web without going broke in the process. That way was to decide how much “budget” each site was allocated based upon its importance to Google and the web as a whole. That is crawl budget.

If a site is very important to the ecosystem, Wikipedia for example, Google would have wanted to allocate a lot of their hypothetical dollars to crawling as much of the site as they could. Alternatively, a brand new website with no authority on the web would be allocated a significantly smaller amount of budget.

New websites

This all makes logical sense. Taking this logic one step further, if the brand new website would have thousands of pages but only a few of them were valuable, it would have been very likely that their budget would have been eaten up by the crawler ingesting the lower quality pages without ever seeing the good ones.

The best approach for a website in this position is to simply declare – via no indexes or canonical tags – which are the lower quality pages and then the crawlers could just skip them.

A happy example

To illustrate this with an example, think about a website like a Happy Meal with a toy inside. You have a certain amount of daily budget to buy Happy Meals, but you only need the unique toys to complete a series. The only way you could find out whether the toy is unique is by buying the meal and opening the box. So, every time a Happy Meal is bought and the duplicate toy shows up, that days budget is wasted – unless you were very hungry. The most efficient way to do the toy collecting is simply to show the name of the toy on the outside of the box and then you would choose only that box.

Continuing this Happy Meal to website analogy, those no index directives and canonical tags are the best way of informing a search engine to ignore a particular box.  The crawler then has more awareness on how to most efficiently spend their limited budget.

Crawl budget summary

This idea of crawl budget applies to every website on the web regardless of authority, its just that more authoritative websites have more budget to be expended by the crawler. As websites gain authority, likely via links or other user engagement signals their budget will expand but without that there is no other way to get more budget.

Google refers to this as “crawl demand” and while they don’t specifically mention authority in their blog post on crawl budget, they sort of beat around it by calling it “popularity.”

Even if the crawl rate limit isn’t reached, if there’s no demand from indexing, there will be low activity from Googlebot. The two factors that play a significant role in determining crawl demand are:

  • Popularity: URLs that are more popular on the Internet tend to be crawled more often to keep them fresher in our index.
  • Staleness: our systems attempt to prevent URLs from becoming stale in the index.

This idea of budget was a key component of Google’s crawling algorithm and it is still exists today although the budget is vastly expanded. Google now has lots more money and resources to crawl the web, but the web is also bigger and more complicated.

Crawl budget today

One other change is that budget was likely initially calculated in small amounts of kilobytes which equated to a number of pages, that budget can be eaten faster if a site has dynamic scripts that are more expensive for the crawler to run.