In January Mike Thelwall opened his article Link-based ranking algorithms with: “According to expert on the economics of search engines, Elizabeth van Couvering of London School of Economics, a key factor in the rise of Google was not only the brilliance of its link-based ranking algorithm but also the fact that it took time for spammers to develop link spam techniques.”
I must respectfully disagree with some points.
A Brief History of Link Spam
Link spam was developed in the mid-1990s as a means of driving traffic to unknown Web sites operating what were then known as “banner farms”. The banner farms might be simple pages a single banner or much more insidious pages that set up bounce loops, where your browser continually loaded new pages every second or two.
Although we had a few search engines at the time, most Web sites were getting traffic from non-search promotion. These guys got links from directories, free-for-all link pages, primitive forums that didn’t even provide much in the way of moderation tools, guest books, Webrings, and hacked Web sites. You never knew where a link would take you in those days, and many people — having no idea what to place on their personal Web pages — simply put up small lists of links to sites they knew about without any regard for theme, relevance, or safety.
When the banner networks began enforcing quality measures on their publisher sites the link-droppers took things to the next level. They started capturing traffic from search engines. Early search engines were easy to game. All you had to do was embed a few keywords on a page, maybe in the title title, and you were done. Your page banner earned money with every click. Sometimes the ad networks insisted the banners be placed only on pages with real copy. No problem — it was simple enough to copy content from somewhere else and put a banner above it.
Yahoo! helped improve search because its editorial staff filtered out the nasty sites. You generally felt you could trust Yahoo! to send you to reliable content. But the Web grew faster than Yahoo! could index it by hand and algorithmic search began to take on new importance. The second generation search engines were crawling the Web but they lacked the resources to index everything at once. In fact, most of those search engines were powered by Inktomi, the search engine behind search engines.
Inktomi maintained two indexes, a small one with link-rich documents that was distributed to all the Inktomi affiliates like Hotbot and Yahoo!, and a larger index from which Inktomi derived the small index. Inktomi made the small index selections (largely) on the basis of how many links were pointing to the pages. To get a site fully indexed in Inktomi’s primary index you had to have a lot of links pointing to your site.
In 1999 I was moderating a forum about Intomi search for the SEO community. People kept asking questions about how to get their pages into the primary index. “Get more links,” someone suggested. So I experimented. I had a handful of pages (out of several thousand) that were consistently appearing in Inktomi search results. I set up crawl pages (pages with lists of links) on other domains and linked to the crawl pages from the pages I had indexed in Inktomi. Within a month my Inktomi traffic shot through the roof.
Brett Tabke, founder of Webmasterworld and the PubCon Web marketing conference series, took my idea and created the first managed reciprocal linking service. Several hundred people joined his experimental group. Every week they uploaded pages to their sites which linked out to other members’ sites. Brett in effect created the first link farm, and Inktomi dutifully crawled all those links and counted them toward inclusion in the Main Index.
Inktomi and other search engines eventually rolled out larger search indexes, eventually rendering link farms unnecessary for inclusion. However, …
Enter Google — 1999
Soon after Brett launched his reicprocal linking experiment people in the SEO community noted with interest the appearance of a new small search engine: Google. Unlike Yahoo!, Excite, Looksmart, Altavista, and other relics of the mid-1990s Google’s front page was almost empty. The other services were building portals — aggregating media, news, social activities, and custom content on the front page (we created our own “start” pages in those days). Google just offered search, nothing but search.
From the very beginning Google’s search results were considered to be pretty good. They were, in fact, no better than Altavista’s search results but their interface was so clean, so ad-free, that it was easy for people to get used to looking at a white page with nothing but search results. The blinking ad banners were gone. The paid listings were not cluttering up results.
By the end of 1999 and early 2000, people noticed something odd: anyone participating in a link farm seemed to have struck Google gold. The link farms were helping get content crawled, their links were passing PageRank, and the links were passing anchor text.
A great deal of attention was devoted to studying Google’s PageRank. In fact, for several years PageRank was badly mis-understood in the SEO community, who neglected to look at all the information Google founders Larry Page and Sergey Brin provided about their Google algorithm in their initial paper, Anatomy of a Large-Scale Hypertextual Web Search Engine. It should be noted that Google made a big fuss over PageRank, so all eyes were on PageRank for a few years.
PageRank only influenced the search index. It did not decide how the search engine crawled, indexed, or rankings. But one aspect of Google’s link-analysis escaped most SEOs’ attention: that inbound link anchor text was being treated as if it was part of the destination page titles and content. Some SEOs — the guys operating the link farms — immediately began seeing odd referrals for their pages. Soon enough, they learned to fix the anchor text in their link farms.
Enter Link-bombs — 2003
It was not until 2003, however, that the final piece of the puzzle became general knowledge. Everyone knew Google was counting links in a special way, so people were doing their best to obtain links. But they were not really focused on optimizing their link anchor text. If I swapped links with you, it was okay for you to use “Michael’s cool site” instead of “Buy Viagra”.
Then a blogger noticed that whenever he and his friends all linked to a blog post with the same anchor text that blog post zoomed to the top of Google’s search results within a matter of days or weeks for the exact same expression. They called it Google-bombing but by now other search engines were following Google’s lead and allowing links to pass anchor text. The Google bombs should more appropriately be called link bombs.
How links have changed since 1996
I built my first Web site in 1996. Almost from the beginning I found myself building multiple sites. One of the first sites I created was a niche Web directory. The directory eventually grew to include almost 2,000 listings at its peak. During the years I was directly involved in managing the directory I personally reviewed several thousand submissions. Most of those sites never made it in because they consisted of little more than a logo and a list of links.
People simply did not know what to do when they found themselves “able” to create Web pages. They mostly just linked to other people’s lists of links. Surfing the Web in those days was tedious because sometimes folks would just copy other people’s lists of links. I found myself looking at the same 50 or 100 links several times a day when I was evaluating directory submissions.
Some early “social” scripts also made it possible to populate Web sites with content, content made up of links. Netscape introduced the RSS/XML feed concept (they called it Rich Site Summary) in the mid- to late-1990s. A small cottage industry of RSS feed directories and repackagers rose up almost overnight. In 1999 I had created the largest resource for constantly science fiction news headlines just by embedding a few dozen Javascript and HTML link feeds into pages. It took about two days to create the site. It was ugly, it consisted mostly of links, and it made money.
Through the years people have learned how to write articles, thoughtful blog posts, and otherwise create interesting, useful content. Unless you’re just browsing social media sites it’s rare to come across a static HTML site that is just compiling lists of links. Web pages are now being used more expressively.
But social media sites have taken on the link-compiling role that early Web accounts once fulfilled. You can upload your bookmarks, share your browsing history, create “lenses”, “link directories”, promote interesting URLs, and otherwise add links to social media services that exist solely for the sake of compiling link lists. These services may prohibit self-promotion but millions of people have figured out that getting a link from DIGG might bring in some traffic.
The old-school spammers have certainly figured out plenty of ways to game the search algorithms: they have created fake news sites, fake Web directories, fake blogs, fake forums, fake social media sites, even fake Web sites. They still use crawl pages. They still drop links on other people’s guest books, Web forums, blogs, and self-managed directories. There are Internet archiving services, news group gateways, mailing list gateways, and other resources you seldom hear about where spammers have embedded their links — sometimes the links have persisted for years.
I recently looked at how many active accounts I have on a Web forum I manage. There are less than 2,000 people who have posted more than 5 times on that forum. The forum has 6,000 active accounts. And through the years we have deleted over 25,000 spam accounts.
To me, links are ubiquitous. I’ve been reading search engineers’ technical papers about PageRank, methods for improving PageRank, methods for detecting spam, methods for detecting high quality pages, and such for years. There are some very interesting ideas out there. And Google, along with Microsoft, Ask, and Yahoo! (which bought Inktomi) have developed complex, sophisticated link filtering algorithms. But links are not expressions of editorial opinion as Larry Page and Sergey Brin originally suggested when they developed Google. Nor was Google’s success based on the brilliance of PageRank.
Google’s success was based on the brilliance of its simplified Web interface. Their plain pages were a breath of fresh air for people who were tired of seeing too much information included on the page. It was the ease-of-use principle that won out. People did not have to buy subscriptions to use the service, click on annoying ads that obscured the listings, or spend any time thinking about whether the top-ranked pages were paid listings or not.
The quality of Google’s search results has always been vulnerable to link manipulation, and Google was being influenced by manipulative links at least as far back as 1999. The early tests for Google’s modified IR-based indexing/ranking algorithms were run against Stanford University’s Web pages. There probably weren’t many link-spammed pages there so it should be no surprise that PageRank helped improve basic search results.
But the truth of the matter is that the world’s first two Googlers had no idea of how the Web really worked, why people were linking to the stuff they were linking to, or what the significance behind links was. They found data that supported their theory and used that data to launch history’s most successful search engine to date. Their tests predated many of the most powerful link-manipulation techniques, and I think there should be no doubt that those techniques were only developed because of Google. But the search engineering community still has no way of measuring the extent to which link manipulation occurs, much less the variation in techniques that proliferate.
How Google has changed through the years
Google has fought back against link spam with some questionable practices. They have pressured many Web site operators into implementing “rel=’nofollow’” to provide them with a first-line of filtering. After several years I think this approach is having a significant benefit for both Google and the Web community, although I was extremely critical of it at first. But the nofollow solution did not solve the problem Google said it would solve.
Google has also resorted to “punishing” Web sites that buy links by artificially reducing their reported Toolbar PageRank values. In combatting paid links, however, Google for a while promoted the false idea that the leasing of links might somehow violate the U.S. Federal Trade Commission’s prohibition against undisclosed endorsements (a position I showed to be legally untenable because the U.S. government formally disclaims as links as endorsements).
But Google should be credited with encouraging Webmasters to use ethical Web promotion standards. Being the most well-known search engine, they have a bully pulpit from which to preach a good message and some of their messages have been very good. Other search engines have followed Google’s lead in publishing Webmaster guidelines, providing Webmaster resources, and offering advice on how to be included in their indexes.
Google was one of the first search engines to openly reach out to and engage with the search engine optimization community. The engagement has not always been amicable but it helped make us better optimizers while helping search engines improve their results and services. Today’s Google is a far cry better than the first Google and today’s search industry is much more robust and concise than that which existed 10 years ago.
The next frontier in link-based Web indexing
I feel that Google has struggled to make PageRank work in any fashion like their original concept. Once you look past the fact that Google is filtering out billions of links, it does serve up pretty good results — if not always the best or most relevant results. Relevance and authority really cannot be determined by uninformed citation. I might go out of my way to link to the resources I feel are the best available for any topic, but if I am not an expert in that topic my citations provide less than optimal value for people who need guidance on where to look.
Because of Google’s insistence on making PageRank work many very authoritative, highly informative, and extremely useful documents are passed over in Google’s search results for link-popular content that may not even provide good information. I used to say, if all other things are equal, PageRank is as good a method for breaking an IR score tie as any system I have seen. While that remains true on an algorithmic level, we still lack algorithms that can correct themselves the way human judgement seems to able to correct itself.
In that respect, I feel social media sites like DIGG have taken the citation-based concept farther than Google could have with simple PageRank/link analysis. Although social media is also subjected to link abuse, these communities tend to be far more self-regulating than search engines. Some people therefore argue that the next generation of search should be based on social media.
Links help search engines find new content, they help search engines validate that content (at least by showing the content is connected with other content), and they can show — through diversity and volume — that content is considered to be important, if not necessarily accurate or authoritative. Some link analysis algorithms have gone beyond PageRank to develop measures of trust and authority but they must all be protected by filters, moderated by human judgement, and updated to match the complexity of the Web relationships they are indexing.
Maybe a combination of three foundations (link analysis, social engineering, and semantic analysis) will bring about the next revolution in search technology. Link analysis is undoubtedly here to stay, but there was never really a golden age when PageRank worked as promoted, when link spam didn’t exist, and when search results could be fully trusted.
Author: Michael Martinz’s research into search engine optimization and ranking strategies for Inktomi’s customer search engines helped hundreds of business Web site operators improve their stability and visibility in the Inktomi search engines.
Share/Save