In April of this year, Google experienced a bug where new pages stopped getting indexed. In a first for Google, they published a detailed synopsis of what happened.
Most of the time, our search engine runs properly. Our teams work hard to prevent technical issues that could affect our users who are searching the web, or webmasters whose sites we index and serve to users. Similarly, the underlying systems that we use to power the search engine also run as intended most of the time. When small disruptions happen, they are largely not visible to anyone except our teams who ensure that our products are up and running. However, like all complex systems, sometimes larger outages can occur, which may lead to disruptions for both users and website creators.
There are so many interesting nuggets of information in this debrief which supports some commonly held beliefs around SEO and debunks others. h
1) The indexation part of the algo is DISCONNECTED from the crawling algorithm. Clearly Google did not stop crawling new pages – they were just not getting pushed to the index.
2) The index is static – it needs to be “pushed” to datacenters and is not constantly in flux as many think. The index does not get updated on the fly, it is a static library that needs to be refreshed at intervals.
3) Search algorithms do change frequently far more than publicly disclosed or theorized on social media. Google was pushing an update on April 5th that did not coincide with a known algo update . Google says they update their algorithm over 500 times per year which is more than once per day. The fact that Google was “pushing” an index means that the algorithm that accessed that index would also have to have been adjusted.
4) Search console is a live look at data in the index – it wouldn’t have broken if it was disconnected from the index. Search console is one of my favorite search as it gives a look into data that we could not possibly see from any other source.
5) There is a “duplicate management system” as a part of the indexation algorithm NOT crawling. This explains why duplicate content with a canonical can rank for short periods of time if this process is not run in real time.
6) Google really does want websites to be successful in their index and to that end they try to give as much information as possible to optimize sites for Google search. In the battle for search visibility, there is no US vs THEM both Google and website owners want the same thing – to satisfy users.