When avoiding the average is best
Wednesday, November 5, 2008 18:55Although in many cases for the most effective and natural SEO it is best to keep close to the average (e.g., the proportion of same-language pages linking to a site), and to allow a little natural variation around the average, in other cases this is highly unnatural. This is due to something known to scientists as Zipf’s law, the power law or the Pareto principle and a little more widely known as the 80/20 rule. The 80/20 rule claims that in many situations resources are not distributed evenly but that a minority of the population gets a majority share: the richest 20% have 80% of the wealth or the biggest 20% of companies in a sector have an 80% market share. Whilst the exact 80/20 proportions seem to rarely occur on the internet, the general principle of a highly uneven spread of resources does often hold. For example, a small minority of sites probably attract a majority of visitors and hyperlinks.
The graph below of the number of links to web pages in UK universities illustrates the typical highly uneven distribution of links. The few points on the left hand side of the graph represent huge numbers of pages that have attracted only a few links. The scales are so large that it is impossible to read off exact values but 3.7 million pages only attracted one link from other UK university web sites. In addition, a few pages are targeted by a huge number of links and these show up as points on the bottom right of the graph. For example, one page had attracted 27,897 links.

The number of links to UK university web pages. Adapted from: Thelwall, M. & Wilkinson, D. (2003). Graph structure in three national academic Webs: Power laws with anomalies, Journal of the American Society for Information Science and Technology, 54(8), 706-712.
The second graph below shows that the above strange graph in fact hides a clear pattern. The graph below uses the same data but has (logarithmic) distorted scales and shows a broomstick shape. In fact it turns out that most data involving links produces a broomstick shape or a hooked broomstick shape (with the hook at the top left hand corner of the graph) and so there is a certain regularity behind this 80/20 type of behaviour. Mathematicians can even give you an exact formula for this shape (the power law formula).

The number of links to UK university web pages, shown using logarithm scales. Taken from: Thelwall, M. & Wilkinson, D. (2003). Graph structure in three national academic Webs: Power laws with anomalies, Journal of the American Society for Information Science and Technology, 54(8), 706-712.
SEO key point: If you design a search engine optimisation campaign that generates with a very similar number of links to each page or all links to the home page then this may be seen as unnatural because other similar sites may well have an uneven, but regular distribution of number of links to the pages. It would be better to arrange it so that most pages are targeted by a few links with a few pages being targeted by considerably more.
There is a more general lesson from this observation for Scientific Search Engine Optimization: Although in real life many objects may have a natural average and vary quite closely about that average (e.g., the average height of humans), links (and some other things like page lengths and word frequencies) can behave completely differently. In particular, some variant of the 80/20 rule is common on the web. Hence, imitating a naturally successful web site with SEO requires an understanding of what kind of behaviour natural web sites display in terms of whether they vary about an average or whether they display some kind of extreme inequality.



Ben McKay says:
December 17th, 2008 at 11:07 am
Hi Mike,
Great piece, and something we should all bear in mind. It’s a tough one because deep-links are what SEOs dream of, so wouldn’t seem like a bad thing to many! It would be interesting to see a real-life example…
Cheers Mike, very interesting post.
Ben
[Reply]
Mike Reply:
December 20th, 2008 at 10:53 pm
Thanks Ben! Yes, this is definitely a tough issue. I’ll try to give a real-life example with some data in a later post.
Best wishes, Mike
[Reply]
LOTE Web says:
December 18th, 2008 at 9:26 am
Hi,
Really interesting observation. But in my opinion in general it applies to non optimized sites or maybe not properly optimized sites. General rule in optimization is to optimize the page for not more than a few primary keywords. So if few of your site pages attract like 80% of incoming links it is obvious that either there is no proper optimization of the rest of the pages, or there is a lack of quality in the content of the rest pages.
It would be really interesting to know whether Goog guys are taking such data into account.
Cheers, and please keep on with your great posts.
Chris
[Reply]
Mike Reply:
December 20th, 2008 at 11:04 pm
Thanks Chris - yes I think that you are right that there will be differences between optimized and non-optimized sites, also that a site that is well optimized will be different from a badly optimised site. But I think that it is also true that following a general rule has the potential to cause trouble with Google because as soon as it spots a pattern that is not natural then it may penalise the offending site. So we really need to know what is natural! But as a stop-gap I think it is useful to think about making sure that there is a healthy amount of variation in anything that is optimized for a site. Later posts should give more examples!
Also good point about what Google is really doing - we can only guess from the few bits of information that leak out, plus whatever academic research their scientists publish from time to time. But it seems that they are getting much cleverer with their ranking algorithms lately.
Thanks again, Mike
[Reply]
Nick Wilsdon says:
December 18th, 2008 at 2:55 pm
Good post Mike, I especially liked the graphs on link patterns, I hadn’t seen that research before. Added you to my reader!
[Reply]
Mike Reply:
December 20th, 2008 at 11:09 pm
Thanks Nick! By the way, thanks also for your link to a list of Google’s Patents - that’s really useful.
[Reply]
Link Building this Week (51.2008) | Wiep.net says:
December 19th, 2008 at 2:01 pm
[...] Mike Thelwall described how link marketing and Pareto’s Law are connected [...]
Mark Edmondson says:
December 19th, 2008 at 5:13 pm
Nice post about the Pareto law - the power law formula is indicative of scale free networks, which gives the majority of links to very few sites. ‘ere is the wiki - http://en.wikipedia.org/wiki/Scale-free_network
Applying this to SEO would maybe be lots of links to homepage then a few deeplinking?
Mark Edmondson’s last blog post..Is micro-blogging the demise of email?
[Reply]
Mike Reply:
December 20th, 2008 at 11:27 pm
Thanks Mark - this is a useful link for those who want to read deeper and don’t mind a few mathematical formulae. Yes I think that most sites would naturally attract most links to the home page and much fewer deep links - especially commercial sites. Blogs might be a bit of an exception though - blogrolls probably contain mostly home page links but links inside posts might directly link to individual content pages in other blogs and elsewhere?
[Reply]
seobro says:
December 23rd, 2008 at 1:44 pm
I am sure that the google algo is aware of many of these things. Try to see how things happen in nature. And yes, 20% of your links should go to those low value 80% of your pages, but I must say that in my experience it is more like a 95-5 rule with 5% of the pages getting 95% of the links.
[Reply]
Mike Reply:
December 28th, 2008 at 10:40 am
Thanks for this - I have to confess to not having a real feel for what the proportions should be, but I bet it varies by type of site. I’m hoping that Linkdex will eventually be able to answser this kind of question.
[Reply]
Replicating nature with the size of your page | Link Relationships says:
January 6th, 2009 at 10:35 pm
[...] A previous post pointed out that in some cases it is best to avoid average behaviour in your web site. For example, it is more natural for most pages to have few links and a few pages [...]
Fusing gut instinct and experience and science to get rankings | Link Relationships says:
January 19th, 2009 at 10:34 pm
[...] The first time Mike and I met we managed to spend two very short hours debating one simple idea. What is natural? Why, because I believe that it’s an idea that goes to the heart of intelligent and sustainable [...]
Ned says:
January 22nd, 2009 at 9:43 am
It is an interesting post. A similar relationship is seen when applied to the ranking of keywords from search terms against frequency of occurrence. We know this is the long-tail of search and it is demonstrated at http://www.seothegame.com/long-tail-queries-1161.
[Reply]
Mike Reply:
January 29th, 2009 at 7:51 pm
Thanks Ned! This graph shape seems to turn up in all kinds of data now.
[Reply]