Topics and linking

Friday, April 3, 2009 7:08
Posted in category Uncategorized

When one web site links to another web site it is common for the topic of the source page to be the same as the topic of the target page. For instance, a page of information about solicitors is more likely to link to solicitors’ web sites than to pubs. Nevertheless, it is also common for web sites to link to other sites on a different topic. For example, a page of information about commercial law solicitors might link to relevant pages of information about commercial accounting and finance. Hence it would be natural for a web site to attract links mainly from sites covering similar topics as well as a few from sites covering different topics. One old study has also shown this based upon the topics of web sites [Chakrabarti, S., Joshi, M. M., Punera, K., & Pennock, D. M. (2002). The structure of broad topics on the Web]. It also suggests that it is natural to attract links from “topics” that are particularly well represented on the web, such as computing.

Share/Save

Geographic linking again

Wednesday, March 11, 2009 20:54

A previous post explained the importance of ensuring that the links to a site come from its natural area of influence. Failing this test risks the site being identified by Google as unnatural and flagged as spam.

 

A previous post discussed linking on a natural scale within a single country, illustrating this with a graph showing that UK universities tend to link to their neighbours twice as often as to distant universities. The same applies even more on an international scale. For example, whilst there is no technological difficulty hindering a Thai web site from linking to a UK web site, it is unlikely that many Thai webmasters have heard of many UK web sites or brands and so whilst a few links are likely to exist, perhaps from ex-pats, if a UK web site received many links from Thailand then, unless it is clearly relevant to Thailand, this could flag it to Google as unnatural and potential spam.

 

The diagram below illustrates that international linking patterns are a reality, even for universities, which are relatively international in scope (especially for research, which is mostly international). The circles each represent the web site of a top university in Europe or Asia. An arrow from one university to another represents many links from the first to the second (over 100). The clear pattern is that universities link to others from the same country (as shown by the domain name ending) far more than they link to overseas universities. The UK is a partial exception because top UK (and US) universities are global brands and also because English is the first language of science.

 

The interlinking structure of the top web sites in the Asia-Pacific region. Lines represent at least 100 links. Top 5 universities from each Asia Pacific country but universities without enough links removed. Park, H. & Thelwall, M. (2006). Web science communication in the age of globalization: Links among universities’ websites in Asia and Europe. New Media & Society, 8(4), 631-652

 

Above is the interlinking structure of the top web sites in the Asia-Pacific region. Lines represent at least 100 links. Top 5 universities from each Asia Pacific country but universities without enough links removed. Park, H. & Thelwall, M. (2006). Web science communication in the age of globalization: Links among universities’ websites in Asia and Europe. New Media & Society, 8(4), 631-652.

 

SEO key point: Try to attract links mainly from the countries in which the web site or brand is known.

Share/Save

A response to “Link-based ranking algorithms”

Thursday, February 19, 2009 10:48
Posted in category SEO Theory

In January Mike Thelwall opened his article Link-based ranking algorithms with: “According to expert on the economics of search engines, Elizabeth van Couvering of London School of Economics, a key factor in the rise of Google was not only the brilliance of its link-based ranking algorithm but also the fact that it took time for spammers to develop link spam techniques.”

I must respectfully disagree with some points.

A Brief History of Link Spam

Link spam was developed in the mid-1990s as a means of driving traffic to unknown Web sites operating what were then known as “banner farms”. The banner farms might be simple pages a single banner or much more insidious pages that set up bounce loops, where your browser continually loaded new pages every second or two.

Although we had a few search engines at the time, most Web sites were getting traffic from non-search promotion. These guys got links from directories, free-for-all link pages, primitive forums that didn’t even provide much in the way of moderation tools, guest books, Webrings, and hacked Web sites. You never knew where a link would take you in those days, and many people — having no idea what to place on their personal Web pages — simply put up small lists of links to sites they knew about without any regard for theme, relevance, or safety.

When the banner networks began enforcing quality measures on their publisher sites the link-droppers took things to the next level. They started capturing traffic from search engines. Early search engines were easy to game. All you had to do was embed a few keywords on a page, maybe in the title title, and you were done. Your page banner earned money with every click. Sometimes the ad networks insisted the banners be placed only on pages with real copy. No problem — it was simple enough to copy content from somewhere else and put a banner above it.

Yahoo! helped improve search because its editorial staff filtered out the nasty sites. You generally felt you could trust Yahoo! to send you to reliable content. But the Web grew faster than Yahoo! could index it by hand and algorithmic search began to take on new importance. The second generation search engines were crawling the Web but they lacked the resources to index everything at once. In fact, most of those search engines were powered by Inktomi, the search engine behind search engines.

Inktomi maintained two indexes, a small one with link-rich documents that was distributed to all the Inktomi affiliates like Hotbot and Yahoo!, and a larger index from which Inktomi derived the small index. Inktomi made the small index selections (largely) on the basis of how many links were pointing to the pages. To get a site fully indexed in Inktomi’s primary index you had to have a lot of links pointing to your site.

In 1999 I was moderating a forum about Intomi search for the SEO community. People kept asking questions about how to get their pages into the primary index. “Get more links,” someone suggested. So I experimented. I had a handful of pages (out of several thousand) that were consistently appearing in Inktomi search results. I set up crawl pages (pages with lists of links) on other domains and linked to the crawl pages from the pages I had indexed in Inktomi. Within a month my Inktomi traffic shot through the roof.

Brett Tabke, founder of Webmasterworld and the PubCon Web marketing conference series, took my idea and created the first managed reciprocal linking service. Several hundred people joined his experimental group. Every week they uploaded pages to their sites which linked out to other members’ sites. Brett in effect created the first link farm, and Inktomi dutifully crawled all those links and counted them toward inclusion in the Main Index.

Inktomi and other search engines eventually rolled out larger search indexes, eventually rendering link farms unnecessary for inclusion. However, …

Enter Google — 1999

Soon after Brett launched his reicprocal linking experiment people in the SEO community noted with interest the appearance of a new small search engine: Google. Unlike Yahoo!, Excite, Looksmart, Altavista, and other relics of the mid-1990s Google’s front page was almost empty. The other services were building portals — aggregating media, news, social activities, and custom content on the front page (we created our own “start” pages in those days). Google just offered search, nothing but search.

From the very beginning Google’s search results were considered to be pretty good. They were, in fact, no better than Altavista’s search results but their interface was so clean, so ad-free, that it was easy for people to get used to looking at a white page with nothing but search results. The blinking ad banners were gone. The paid listings were not cluttering up results.

By the end of 1999 and early 2000, people noticed something odd: anyone participating in a link farm seemed to have struck Google gold. The link farms were helping get content crawled, their links were passing PageRank, and the links were passing anchor text.

A great deal of attention was devoted to studying Google’s PageRank. In fact, for several years PageRank was badly mis-understood in the SEO community, who neglected to look at all the information Google founders Larry Page and Sergey Brin provided about their Google algorithm in their initial paper, Anatomy of a Large-Scale Hypertextual Web Search Engine. It should be noted that Google made a big fuss over PageRank, so all eyes were on PageRank for a few years.

PageRank only influenced the search index. It did not decide how the search engine crawled, indexed, or rankings. But one aspect of Google’s link-analysis escaped most SEOs’ attention: that inbound link anchor text was being treated as if it was part of the destination page titles and content. Some SEOs — the guys operating the link farms — immediately began seeing odd referrals for their pages. Soon enough, they learned to fix the anchor text in their link farms.

Enter Link-bombs — 2003

It was not until 2003, however, that the final piece of the puzzle became general knowledge. Everyone knew Google was counting links in a special way, so people were doing their best to obtain links. But they were not really focused on optimizing their link anchor text. If I swapped links with you, it was okay for you to use “Michael’s cool site” instead of “Buy Viagra”.

Then a blogger noticed that whenever he and his friends all linked to a blog post with the same anchor text that blog post zoomed to the top of Google’s search results within a matter of days or weeks for the exact same expression. They called it Google-bombing but by now other search engines were following Google’s lead and allowing links to pass anchor text. The Google bombs should more appropriately be called link bombs.

How links have changed since 1996

I built my first Web site in 1996. Almost from the beginning I found myself building multiple sites. One of the first sites I created was a niche Web directory. The directory eventually grew to include almost 2,000 listings at its peak. During the years I was directly involved in managing the directory I personally reviewed several thousand submissions. Most of those sites never made it in because they consisted of little more than a logo and a list of links.

People simply did not know what to do when they found themselves “able” to create Web pages. They mostly just linked to other people’s lists of links. Surfing the Web in those days was tedious because sometimes folks would just copy other people’s lists of links. I found myself looking at the same 50 or 100 links several times a day when I was evaluating directory submissions.

Some early “social” scripts also made it possible to populate Web sites with content, content made up of links. Netscape introduced the RSS/XML feed concept (they called it Rich Site Summary) in the mid- to late-1990s. A small cottage industry of RSS feed directories and repackagers rose up almost overnight. In 1999 I had created the largest resource for constantly science fiction news headlines just by embedding a few dozen Javascript and HTML link feeds into pages. It took about two days to create the site. It was ugly, it consisted mostly of links, and it made money.

Through the years people have learned how to write articles, thoughtful blog posts, and otherwise create interesting, useful content. Unless you’re just browsing social media sites it’s rare to come across a static HTML site that is just compiling lists of links. Web pages are now being used more expressively.

But social media sites have taken on the link-compiling role that early Web accounts once fulfilled. You can upload your bookmarks, share your browsing history, create “lenses”, “link directories”, promote interesting URLs, and otherwise add links to social media services that exist solely for the sake of compiling link lists. These services may prohibit self-promotion but millions of people have figured out that getting a link from DIGG might bring in some traffic.

The old-school spammers have certainly figured out plenty of ways to game the search algorithms: they have created fake news sites, fake Web directories, fake blogs, fake forums, fake social media sites, even fake Web sites. They still use crawl pages. They still drop links on other people’s guest books, Web forums, blogs, and self-managed directories. There are Internet archiving services, news group gateways, mailing list gateways, and other resources you seldom hear about where spammers have embedded their links — sometimes the links have persisted for years.

I recently looked at how many active accounts I have on a Web forum I manage. There are less than 2,000 people who have posted more than 5 times on that forum. The forum has 6,000 active accounts. And through the years we have deleted over 25,000 spam accounts.

To me, links are ubiquitous. I’ve been reading search engineers’ technical papers about PageRank, methods for improving PageRank, methods for detecting spam, methods for detecting high quality pages, and such for years. There are some very interesting ideas out there. And Google, along with Microsoft, Ask, and Yahoo! (which bought Inktomi) have developed complex, sophisticated link filtering algorithms. But links are not expressions of editorial opinion as Larry Page and Sergey Brin originally suggested when they developed Google. Nor was Google’s success based on the brilliance of PageRank.

Google’s success was based on the brilliance of its simplified Web interface. Their plain pages were a breath of fresh air for people who were tired of seeing too much information included on the page. It was the ease-of-use principle that won out. People did not have to buy subscriptions to use the service, click on annoying ads that obscured the listings, or spend any time thinking about whether the top-ranked pages were paid listings or not.

The quality of Google’s search results has always been vulnerable to link manipulation, and Google was being influenced by manipulative links at least as far back as 1999. The early tests for Google’s modified IR-based indexing/ranking algorithms were run against Stanford University’s Web pages. There probably weren’t many link-spammed pages there so it should be no surprise that PageRank helped improve basic search results.

But the truth of the matter is that the world’s first two Googlers had no idea of how the Web really worked, why people were linking to the stuff they were linking to, or what the significance behind links was. They found data that supported their theory and used that data to launch history’s most successful search engine to date. Their tests predated many of the most powerful link-manipulation techniques, and I think there should be no doubt that those techniques were only developed because of Google. But the search engineering community still has no way of measuring the extent to which link manipulation occurs, much less the variation in techniques that proliferate.

How Google has changed through the years

Google has fought back against link spam with some questionable practices. They have pressured many Web site operators into implementing “rel=’nofollow’” to provide them with a first-line of filtering. After several years I think this approach is having a significant benefit for both Google and the Web community, although I was extremely critical of it at first. But the nofollow solution did not solve the problem Google said it would solve.

Google has also resorted to “punishing” Web sites that buy links by artificially reducing their reported Toolbar PageRank values. In combatting paid links, however, Google for a while promoted the false idea that the leasing of links might somehow violate the U.S. Federal Trade Commission’s prohibition against undisclosed endorsements (a position I showed to be legally untenable because the U.S. government formally disclaims as links as endorsements).

But Google should be credited with encouraging Webmasters to use ethical Web promotion standards. Being the most well-known search engine, they have a bully pulpit from which to preach a good message and some of their messages have been very good. Other search engines have followed Google’s lead in publishing Webmaster guidelines, providing Webmaster resources, and offering advice on how to be included in their indexes.

Google was one of the first search engines to openly reach out to and engage with the search engine optimization community. The engagement has not always been amicable but it helped make us better optimizers while helping search engines improve their results and services. Today’s Google is a far cry better than the first Google and today’s search industry is much more robust and concise than that which existed 10 years ago.

The next frontier in link-based Web indexing

I feel that Google has struggled to make PageRank work in any fashion like their original concept. Once you look past the fact that Google is filtering out billions of links, it does serve up pretty good results — if not always the best or most relevant results. Relevance and authority really cannot be determined by uninformed citation. I might go out of my way to link to the resources I feel are the best available for any topic, but if I am not an expert in that topic my citations provide less than optimal value for people who need guidance on where to look.

Because of Google’s insistence on making PageRank work many very authoritative, highly informative, and extremely useful documents are passed over in Google’s search results for link-popular content that may not even provide good information. I used to say, if all other things are equal, PageRank is as good a method for breaking an IR score tie as any system I have seen. While that remains true on an algorithmic level, we still lack algorithms that can correct themselves the way human judgement seems to able to correct itself.

In that respect, I feel social media sites like DIGG have taken the citation-based concept farther than Google could have with simple PageRank/link analysis. Although social media is also subjected to link abuse, these communities tend to be far more self-regulating than search engines. Some people therefore argue that the next generation of search should be based on social media.

Links help search engines find new content, they help search engines validate that content (at least by showing the content is connected with other content), and they can show — through diversity and volume — that content is considered to be important, if not necessarily accurate or authoritative. Some link analysis algorithms have gone beyond PageRank to develop measures of trust and authority but they must all be protected by filters, moderated by human judgement, and updated to match the complexity of the Web relationships they are indexing.

Maybe a combination of three foundations (link analysis, social engineering, and semantic analysis) will bring about the next revolution in search technology. Link analysis is undoubtedly here to stay, but there was never really a golden age when PageRank worked as promoted, when link spam didn’t exist, and when search results could be fully trusted.

Author: Michael Martinz’s research into search engine optimization and ranking strategies for Inktomi’s customer search engines helped hundreds of business Web site operators improve their stability and visibility in the Inktomi search engines.

Share/Save

Language and linking

Thursday, February 5, 2009 22:08

Assuming that your web site is written in English, should you try to attract links to it from pages in other languages? If a Spanish web site links to it, would this be seen by search engines as a promising indicator of multilingual appeal or would it flag the site as potential spam? To answer this question we need to know whether it is natural to attract links across languages. The diagrams below illustrate links from pages in EU universities written in Spanish, Dutch, Swedish, English and French to other EU universities. It should be fairly easy to work out which is which for at least four of them! (Universities again in this blog? Yes, this is what I have data for and of course universities are certainly not typical and do not behave like commercial web sites but I hope we will get access to non-university data soon.) In the diagrams countries are represented by their Top Level Domains. The thick arrows represent many links and arrows are not drawn when there are few links. The pattern is for the vast majority of links, but not all, to be between countries sharing the language of the link page. Hence, except for English, cross-lingual linking seems rare, even in this highly-educated context.

Links between EU universities in one language - but which one?

Links between EU universities in one language - but which one?

 

Links between EU universities in one language.

Links between EU universities in one language.

 

Links between EU universities in one language.

Links between EU universities in one language.

 

Links between EU universities in one language.

Links between EU universities in one language.

Links between EU universities in one language.

Links between EU universities in one language.

All diagrams taken from: Thelwall, M., Tang, R. & Price, E. (2003). Linguistic patterns of academic web use in Western Europe, Scientometrics, 56(3), 417-432.

 

What is natural in terms of cross-lingual linking? It seems that a webmaster would be unlikely to create a link to a page that most visitors could not read. In UK web sites this suggests that there would be few genuine links to pages in other languages, because few English people are bilingual. The exceptions would be languages represented by significant migrant/diaspora communities, such as in Polish, Urdu, Punjabi, and Hindi. There would also be a few links created by smaller communities, including overseas students (e.g., Chinese, Japanese).

 

For pages in English the situation concerning which language web sites could naturally link to them is unlike all other languages. This is because English is an international language, especially on the Internet. It would not be unusual for non-English sites to link to English sites, especially from countries with citizens that can often read English (e.g., The Netherlands, Scandinavia), and especially for sites aimed at more educated users who are more likely to have to use English at work.

 

SEO key point: Unless dictated by the site theme, try to attract links mostly from same‑language pages, but a few links from natural other languages are also useful. For sites in English, it would be reasonable to attract many more links than average from non-English sites

Share/Save

Link-based ranking algorithms

Wednesday, January 21, 2009 22:29

According to expert on the economics of search engines, Elizabeth van Couvering of London School of Economics, a key factor in the rise of Google was not only the brilliance of its link-based ranking algorithm but also the fact that it took time for spammers to develop link spam techniques. As a result of this, whilst the established search engines of the late 90s like AltaVista devoted up to half of the time of their staff fighting spam, Google could develop most of its time to developing its core search and ranking algorithms. This made it cost-effective as well as simply more effective.

 

Today there are many link-based ranking algorithms and every major search engine probably uses one at the heart of its algorithm. The reason is simple: important pages are linked to often and how else could you identify important pages other than by traffic and links? And traffic for new sites that a search engine does not yet rank, it might not have good traffic information.

 

But what are the new algorithms and are they all essentially the same? As has been widely written about, Google’s PageRank is a mathematical process based upon links between pages and sites and which tends to assign the best ranks to pages and sites that are most linked to, especially if when the links are themselves from important sites (i.e., also attracting the most links). An early competitor to PageRank was HITS (Hyperlink Induced Topic Search), which worked similarly to PageRank but assigned good ranks to pages that either linked to many pages relevant to a search (called hubs) or were linked to by may pages relevant to a search (called authorities).

 

HITS is clearly a different algorithm because PageRank ignores hubs because it does not care how many outlinks a site has. Also, HITS is topic-based, meaning that the rankings depend on the search entered. But the difference is not as large as it seems because good hub pages (e.g., a directory of links for a topic) would naturally attract links and hence get a good authority rating. Secondly, Google’s PageRank is always used in conjunction with other ranking factors like keyword frequency and so, in practice, it is likely to be not too different to HITS. A version of HITS was used by an old search engine called Teoma –its comparative lack of success was probably because it was more expensive to implement than PageRank or simply because it started years after Google.

 

My feeling is that Google has identified what is essentially the right way to go about link-based ranking –i.e., give higher ranks to sites attracting more and higher quality links – and that all other link algorithms in use are likely to be broadly similar in practice most of the time, even if they differ for individual sites, occasionally substantially.

 

The exception to this SpamRank: ranking pages on the likelihood of them being Spam. Pages that are known to be spam can pass on SpamRank to the pages that they link to in the same way that PageRank is passed on. The results of combining SpamRank (or equivalent) with PageRank (or equivalent) are likely to be very different from PageRank alone, especially for Spam sites!

Share/Save

Fusing gut instinct and experience and science to get rankings

Monday, January 19, 2009 22:33

The first time Mike and I met we managed to spend two very short hours debating one simple idea. What is natural? Why, because I believe that it’s an idea that goes to the heart of intelligent and sustainable SEO.

The seed of the ‘what is natural’ thought goes back 3 years or so when I stumbled across a research project that was sponsored by Yahoo and talked about detecting link spam, or put another way SEO experts trying to game the system. If yahoo had a number of mathematical brains working on finding out when a link was a good, or should I say ‘natural’ link, then I was certain that Google did and was probably very good at it and likely to get better. This then led me to down a path that influenced how I did and still practice SEO to this day.

Think like Google
If you want to do SEO well which at the end of the day means get traffic that generated response you have got to make sure your efforts are rewarded. So what if every link you created was actually doing you more harm than good as it was moving you ever further away from what is ‘natural’. It’s a thought that I think is possible. Let me explain.

Let us assume there is a natural range for the following inbound link characteristics:

•    Page Rank – the number of PageRank  0 – 10 links you have pointing at your site
•    Home vs Deep link – how many links would naturally point at your homepage and how may would point at your deep content
•    Anchor text – what is the natural ratio of keyword to brand / URL?
•    Keyword  variation – If all the links are on a range of head tail terms e.g. credit card is that natural? My gut instinct tells me to mix it up and blend with 3 and 4 word keyword strings as that is what would happen naturally if bloggers and other publishers were linking to you
•    Link growth – Over time, how do links grow? What is a normal link growth curve? When do the warning flags go up?
•    Links from a single article or post – if you have three links to your site from a single blog post all with head tail keywords, I would suspect you of buying those links. That’s if I were Google. So what is an acceptable or natural number of links per page?

When you start thinking in this way it opens up a can of worms that challenges a number of common SEO techniques.

It is possible that every PageRank 4 link you build moves you further away from natural, or every ‘credit card’ link is showing you up to be a link engineer and is downgrading the whole SEO effort?

Back to Mike
What I love about getting practitioners of SEO together with scientists like Mike is we can find out ‘what is natural’ and alter our strategy. I think SEO needs to evolve and we’ve relied on the gut feel and here say too long. With the help of science and technology think more intelligently about what we are doing every day to get the rankings that make a difference to a sites performance. I believe that Mike will help, the industry can take a big leap forward and he has my full support. I also know the technology that is supporting Mike in his discoveries is also going to lift the lid on this whole subject so watch this space.

If you have another ‘what is natural’ discovery you would like to discover, please add it below in the comments.

Guest post written by: Matt Roberts, an Online Marketing Consultant based in London. He is currently working on TalkAboutDebt.co.uk - an new debt help community focused on IVA.

Share/Save

Make your SEO experiments work

Friday, January 9, 2009 9:53
Posted in category SEO Theory, Scientific SEO

Knowledge from successful SEO experiments has always separated good SEO consultants from industry leading professionals. Every agency or professional always has some “proprietary knowledge” that gives them the edge over others.

The very nature of SEO and complexity of Google ranking algorithm means that there are always things where SEO experiments can help improve your SEO program.

My five golden rules of SEO experiments are:

  1. Make sure you understand what you are trying to achieve.
  2. Setup a control / benchmark either with your clients or your recent experience
  3. Experiment with a big enough sample size. The bigger the test, the more accurate the data.
  4. Document everything - Make sure you store the raw data somewhere, many scientific discoveries in the 20th century comes from looking at data from previous experiments
  5. Make sense of your results. Although some experiments are a simple yes or no, I’ve always found by publishing your results online means you will get some great comments and criticism from other professionals

Luckily for our industry, there are lots of SEO folks that have kindly shared their research with us. Here are my 3 favourite SEO experiments that have worked well and given useful information. Great work guys, keep sharing the knowledge!

In addition its also worth mentioning websites such as SEO theory which is give lots of food for thought and also Science for SEO for a scientific insight into how search engines work in the way they do.

Coming next week, I will release the Results of LinkRelationships.com’s SEO experiment: The importance of quality over quantity.

Guest post author: Kun Dang

Share/Save

Replicating nature with the size of your page

Tuesday, January 6, 2009 18:52

A previous post pointed out that in some cases it is best to avoid average behaviour in your web site. For example, it is more natural for most pages to have few links and a few pages to have many links than for all pages to have a similar number of links.

The same applies to the size of pages in a site. The diagram below shows the distribution of page sizes for Australian university websites, as measured by the number of words in each page. Note that the graph on has a distorted (logarithmic) scale to show the underlying pattern. From the top-left of the graph it can be seen that a huge number of pages have up to a few hundred words. From the bottom right of the graph it can be seen that a few pages have a huge number of words. This suggests that it is natural for most pages in a web site to be quite small but it is OK for some of the pages to be considerably larger or even huge. Nevertheless, there isn’t a natural “average” size that most pages are similar to – just a tendency for smaller pages to be more common.

An artificial site could easily have all pages being almost exactly the same size or varied slightly around what the author believed to be the optimal average size. This could show up to search engines as anomalous behaviour and should be avoided. The need to vary significantly from the average should not be exaggerated, because many perfectly good sites will be designed using a template by a designer that strives to get a similar appearance to all or most pages. This would have the natural effect of producing a site with many similar-sized pages. Nevertheless, this approach may not be optimal as the result may have some characteristics of an artificial web site.

The distribution of page sizes (words) for Australian universities

The distribution of page sizes (words) for Australian universities. From: Thelwall, M. (2005). Text characteristics of English language university web sites. Journal of the American Society for Information Science and Technology, 56 (6), 609–619.

Share/Save

Natural numbers of links to and from web sites

Monday, December 15, 2008 22:26

There is probably a natural average number of links to and from a web site that is based upon the type and size of its owning organisation. For example, the web site of a corner shop is likely to attract few links and host few links. In contrast, the site of a large university will almost certainly host and attract very many links – perhaps hundreds of thousands. For UK universities, the exact number of links the web site should receive has been calculated based upon the amount of research the university does. In the graph below, showing real data from UK universities, the number of links received by a university from web sites in other UK universities is proportional to the target university’s research productivity. Hence, if a university web site attracted many more links than appropriate for its research productivity then this could stand out as unusual and potential spam to Google. Of course, Google is unlikely to have data on university research and to use it in this way! But it will have other data that are also related – such as the number of visitors, pages, and links from the site. If Google cross-references this kind of data then is likely to reveal the anomaly.

The number of links to UK university web sites from other UK university web sites – showing that this is approximately proportional to the research productivity of the target university. [Source: Thelwall, M. (2002). Conceptualizing documentation on the Web: an evaluation of different heuristic-based models for counting links between university web sites, Journal of the American Society for Information Science and Technology, 53(12), 995-1005.]

The number of links to UK university web sites from other UK university web sites – showing that this is approximately proportional to the research productivity of the target university. Source: Thelwall, M. (2002). Conceptualizing documentation on the Web: an evaluation of different heuristic-based models for counting links between university web sites, Journal of the American Society for Information Science and Technology, 53(12), 995-1005.

SEO key point: One source of unnatural behaviour is a web site attracting or hosting too many or too few links. For example, a site with a few pages that are all long link lists is an obvious spam site. Less obviously, attracting a few extra links from normal sites is likely to be more valuable in the long run than generating many extra artificial links.

Share/Save

Geographic linking

Wednesday, November 19, 2008 11:45

Effective scientific search engine optimisation (SEO) means ensuring that a web site looks important to search engines in as natural a way as possible. One factor that SEO professionals rarely take into account is geography: is it more natural for a web site to attract links from nearby web sites (e.g., in the same city or country) or equally from all over the web? In theory it is no more difficult for a webmaster to create a link to a web site on the other side of the world than to a neighbouring web site so geography should not be important. But in practice the webmaster is more likely to link to the neighbouring web site because she recognises the brand, uses the service or has seen it advertised. Hence natural web linking should reflect the geographic spread of people who would naturally be aware of the site or brand.

Below is an example to illustrate this point. It shows the average tendency for UK universities to link to each other as a function of their distance apart. It shows that universities that are close together – within 50 miles of each other – are twice as likely to hyperlink that those that are hundreds of miles apart. This factor of two is probably peculiar to universities because part of their job is to be aware of each other and to communicate and collaborate. Note that the distant sites do interlink even though they do it much less than close sites. The important point here is not to have all local links, all distant links or a random selection of links, but that there is a natural way in which the spread of links relates to geography. Anyone who does not know what this natural spread is will run the risk that their SEO strategy is unnatural and attracts spam penalties for their site from search engines.

More generally, most web sites probably attract the vast majority of links from their natural geographic area of influence and probably only a few from outside. Links to commercial web sites will probably also follow a geographic pattern but the exact pattern will depend upon factors like their history, market strategy and business partners.

“]Thelwall, M. (2002). Evidence for the existence of geographic trends in university web site interlinking. Journal of Documentation, 58(5), 563-574.]

Factoring out university sizes, the tendency to interlink decreases for UK universities, the further apart they are. Figure adapted from: Thelwall, M. (2002). Evidence for the existence of geographic trends in university web site interlinking. Journal of Documentation, 58(5), 563-574.

SEO key point: Try to attract links to your site mainly, but not exclusively, from its natural geographic area of influence. If you have data on the geographic linking pattern of successful sites similar to yours then use that data to guide the selection of links to your site.

Share/Save