Links are a critical part of Google’s ranking algorithms as a link to a page is a vote of popularity and at times contextual relevance. The authority lent by an inbound link doesn’t just apply to external sites linking in, but the same applies to internal links (pages within a site) too. A website draws its overall authority score – Pagerank as Google’s ranking patents refers to it, by the sum of all the authority of sites that link into the site.
The best way of explaining this is to use the words from Sergey Brin and Larry Page’s original research:
Academic citation literature has been applied to the web, largely by counting citations or backlinks to a given page. This gives some approximation of a page’s importance or quality. PageRank extends this idea by not counting links from all pages equally, and by normalizing by the number of links on a page. PageRank is defined as follows: We assume page A has pages T1…Tn which point to it (i.e., are citations). The parameter d is a damping factor which can be set between 0 and 1. We usually set d to 0.85. There are more details about d in the next section. Also C(A) is defined as the number of links going out of page A. The PageRank of a page A is given as follows: PR(A) = (1-d) + d (PR(T1)/C(T1) + … + PR(Tn)/C(Tn)) Note that the PageRanks form a probability distribution over web pages, so the sum of all web pages’ PageRanks will be one.
To the layman this is just saying that each page begins with a score of 1, but the final score is a function of all its outbound links added to the score of all of its inbound links.
In this calculation, the most linked page on a website will tend to be its homepage which then distributes that authority throughout the rest of the website. Pages that are close to the homepage or linked to more frequently from pages linked from the homepage will score higher. In this regard achieving this right mix via internal linking is critical.
Inbound link authority
Additionally, the homepage will never be the only page that receives authoritative external links, so if an internal page is the recipient of a powerful external link but doesn’t link to other pages, that external link is essentially wasted. When pages link to each other the authority of all external links is funneled around a site to the overall benefit of all pages.
For sites with flat architecture or only a handful of pages, a proper internal link structure is simple and straightforward, but on large sites improving internal links can be as powerful as acquiring authoritative external links. (A large site can be even be one that only has one hundred pages.)
Large site challenge
Due to the nature of how many large sites are structured there are invariably going to be orphaned pages – defined as pages that don’t have any or many links pointing into them. Even a media site like a blog or daily news site which has a very clean architecture – each post/article lives under a specific day will also have an internal linking challenge.
More than likely, the site will desire organic traffic that isn’t just someone searching out that day’s or recent news. There will be posts that they might hope would be highly visible many years into the future. Think of the review of a product on its launch day which is relevant as long as the product is on the shelf. Or a well-researched item which explains how something works, like the electoral college as an example. Granted these posts were published on a certain day but they are relevant for many queries essentially forever.
Ideal link architecture
As you might imagine, for all sites with this challenge, creating an ideal link architecture that flows links around the site can have a huge impact on overall traffic as these orphaned or weakly linked pages join the internal site link graph.
How to improve the link graph
Implementations of related page algorithms on each page – quite simply a module with related links – to crosslink other pages can go a long way into supporting this link flow, but that’s only if the algorithm isn’t tightly tuned to a specific relationship. Sometimes when these algorithms are developed, they key off specific connections between pages which has the effect of creating heavy internal linking between popular topics while still leaving pages orphaned or near-orphaned.
There are three possible ways to overcome this effect:
- Add a set of random links in the algorithm and either hard code these random offerings into the page or refresh the set of random pages whenever the cache updates. Updating on every page load might be resource intensive, so as slow as every day would achieve the same outcome.
- In addition to related pages include a linking module for ‘interesting’ content– which is driven by pure randomization – also refreshed as in the first recommendation.
- Include a module on every page for the most recent content which insures that older pages are linking into new pages.
As an aside, I also like to always build an HTML sitemap for all large sites as this gives one place that every single page is linked. If the sitemap is linked in the footer it will achieve the goal of having most pages just one click from the homepage. Transparently, Google’s John Mueller suggested that HTML sitemaps aren’t necessary, but I have always found that on large sites they can be very powerful.
Visualizing internal link graphs
To visualize what a desired structure of internal linking should be, I tend to the think of a site’s link graph like an airline route map.
The least effective internal link graph looks like the route map of a national carrier for a small country. These air carriers will have a single hub in their capital city and then have spokes pointing around the world from that hub. Here is the route map for Singapore airlines which has impressive reach for a flag carrier, with only a few exceptions all their flights terminate in Singapore.
Flipping this visual over to websites, think of the hub as the homepage. The homepage links out to all of the other pages, but very few of the internal pages link to other pages.
The most common type of link graph looks like the route map of a large global carrier. Look at United Airlines as an example. There are very clear hubs (San Francisco, Los Angeles, Chicago, Newark, Houston, Denver…) and these hubs connect to each other and other smaller satellite cities.
Again, flipping this over to websites, the homepage would be the biggest city on the route map: Newark which links to all the other big cities in addition to all the hubs. The other hubs would be important category pages which have a lot of inbound links and then links out to all the other smaller pages. In this link graph, important but smaller pages would only have one pathway to get to them. (As an example, Mumbai is only connected to Newark.)
The most ideal internal link graph looks like the route map of a budget airline that thrives on point to point connections. To the bicoastal business traveler this route map makes no sense, but the wandering tourist can get to anywhere they need to go as long as they can handle many stopovers. Southwest Airlines is a great example of this structure.
Southwest has such a complicated route map, they don’t even show it on their website. You would have to choose a particular city and then see all the places you can get to directly. There are certainly some more popular cities within their route map, but their direct flights almost seem to be random. A traveler can get fly directly from Cleveland to major travel gateways like Atlanta, Chicago and Dallas, but they can also go to Nashville, St Louis, Tampa and Milwaukee.
This is how a website should be structured. Pages should link to important pages, but also link to other pages that seem to be random. And, those pages should link back to important pages, and link to other random pages.
To summarize, think of a search engine crawler passing from one page to another calculating authority as a traveler intent on flying to every city on an airline’s route map without ever needing to go to a single city more than once.
On Singapore Airlines, a traveler could get from Mumbai to Frankfurt via Singapore, but to get to Paris (without a codeshare) they would need to get back to Paris.
On United Airlines, a traveler could get from Portland to Dallas via Denver and then could go on to Fort Lauderdale via Houston. They would certainly make it to a number of cities, but at some point they would find themselves connecting through Houston or Denver again.
On Southwest Airlines, a traveler could begin their journey in Boise, Idaho on any one of the ten non-stop flights and make it to nearly every city without ever needing to repeat a city.
Build your internal link architecture like the Southwest Airlines route map and you will never have an orphaned or sub-optimally linked page again.