Chapter 6: All Links Are Not Created Equal: 10 Illustrations of Search Engines' Valuation of Links

Editor's Note: It is surprising how little—and how much—has changed in how search engines evaluate and utilize links since this post was written three years ago. While links are no longer the only signals that search engines use to determine the importance and popularity of a web page, they are still the strongest ones. For the most part, the principals of link valuation as explained here still hold water as well, especially if each one is updated within the current context of relevance and quality. As search engines have become more sophisticated, they have become increasingly adept at determining the relevance and quality of links, and they have adjusted the weight placed on those factors when valuing links accordingly.

In 1997, Google's founders created an algorithmic method to determine importance and popularity based on several key principles:

• Links on the web can be interpreted as votes that are cast by the source for the target.

• All votes are, initially, considered equal.

• Over the course of executing the algorithm on a link graph, pages which receive more votes become more important.

• More important pages cast more important votes.

• The votes that a page can cast are a function of that page's importance, divided by the number of votes/links it casts.

That algorithm, of course, was named PageRank, and it changed the course of web search, providing tremendous value to Google's early efforts around quality and relevancy in search results. (Note: You can read the paper that Sergey Brin and Lawrence “Larry” Page presented to Standford on their prototype for a large-scale search engine at http://infolab.stanford.edu/~backrub/google.html.)

As knowledge of PageRank spread, those with a vested interest in influencing the search rankings (SEOs) found ways to leverage this information for the benefit of their web pages. But Google didn't rest on their laurels in the field of link analysis. They continuously improved the algorithm. Today, anchor text, trust, hubs and authorities, topic modeling, and even human activity are just a few of the signals Google takes into account when determining the weight a link carries. Yet, unfortunately, many in the SEO field are still unaware of these changes and how they impact external marketing and link acquisition best practices.

In this post, I walk through ten principles of link valuation that can be observed, tested, and (in some cases) patented. I'd like to extend special thanks to Bill Slawski from SEO by the Sea (http://www.seobythesea.com), whose recent posts on “Google's Reasonable Surfer Model” and “What Makes a Good Seed Site for Search Engine Web Crawls?” were catalysts (and sources) for this blog post.

Editor's Note: You can read these posts at www.seobythesea.com/2010/05/googles-reasonable-surfer and www.seobythesea.com/?p=3790, respectively.

As you read through the following ten principles of link valuation, please note that these are not hard and fast rules. Based on our experience, testing, and observations, though, they are accurate from our perspective. As with all things in SEO, we strongly encourage our readers to test these principles for themselves. Nothing is better for learning SEO than going out and experimenting in the wild.

Principle #1: Links Higher Up in HTML Code Cast More Powerful Votes

9781118551585-un0601.eps

Whenever we test page features in what we hope are controlled environments on the web, we find that links higher up in the HTML code of a page seem to pass more ranking value than those lower down do. Many other SEOs we've talked to also find the same thing. This principle, that Links Higher Up in HTML Code Cast More Powerful Votes, certainly fits with the recently granted Google patent application “Ranking Documents Based on User Behavior and/or Feature Data” (www.google.com/patents/US7716225), which suggests a number of items that may be considered in the way that link metrics are passed.

9781118551585-un0602.eps

In testing environments, SEOs often grapple with the power of the “higher link wins” phenomenon, and it can take a surprising amount of on-page optimization to overcome the power that the higher link carries.

Principle #2: External Links Are More Influential Than Internal Links

9781118551585-un0603.eps

There's little surprise here. If you recall, the original PageRank concept makes no mention of external versus internal links counting differently. It's quite likely that other, more recently created metrics (post-1997) do reward external links over internal links. You can see this in the correlation data from our post a few weeks back (see www.moz.com/blog/the-science-of-ranking-correlations). The external mozRank (i.e., the “PageRank” sent from external pages) had a much higher correlation with rankings than standard mozRank (PageRank).

9781118551585-un0604.eps

I don't think it's a stretch to imagine that Google calculates external PageRank and internal PageRank separately, and uses them in different ways for page valuation.

Principle #3: Links From Unique Domains Matter More Than Links From Previously Linking Sites

9781118551585-un0605.eps

Speaking of correlation data, no single metric is more closely correlated with how Google determines search result rankings than the number of unique domains containing an external link to a given page. This strongly suggests that a diversity component is at play in the ranking system, and that it's better to have 50 links from 50 different domains than to have 500 more links from a site that already links to you. Curiously, the original PageRank algorithm makes no provision for this. This could be one reason that site-wide links from domains with many high-PageRank pages worked so well in Google's early years.

Principle #4: Links From Sites Closer to a Trusted Seed Set Pass More Value

9781118551585-un0606.eps

We've talked about TrustRank on Moz (www.moz.com/blog/whiteboard-friday-domain-trust-authority), and Yahoo! has published a research paper on it, “Combating Webspam with TrustRank” (http://www.vldb.org/conf/2004/RS15P3.PDF). However, Google's certainly done plenty on this front as well. Bill Slawski speaks about this on his blog, SEO by the Sea, in the post “Google Trust Rank Patent Granted” (www.seobythesea.com/2009/10/google-trust-rank-patent-granted).

The abstract for Google's application to patent a search engine algorithm that selects trusted seed sites explains how the search engine determines its own trust metric:

A host-based seed selection process considers factors such as quality, importance and potential yield of hosts in a decision to use a document of a host as a seed. A subset of a plurality of hosts is determined, including some but not all of the plurality of the hosts, according to an indication of importance of the hosts, according to an expected yield of new documents for the hosts, and according to preferences for the markets the hosts belong to. At least one seed is generated for each host of the determined subset of hosts, wherein each generated at least one seed includes an indication of a document in the linked database of documents. The generated seeds are provided to be accessible by a database crawler.

Linkscape's own mozTrust score functions in precisely this way, using a PageRank-like algorithm that's biased to only flow link juice from trusted seed sites rather than equally from across the web.

Editor's Note: The full text of Google's application Host-Based Seed Selection Algorithm for Web Crawlers can be read at http://mz.cm/15GWlsO.

Principle #5: Links From “Inside” Unique Content Pass More Value Than Those From Header/Footer/Sidebar Navigation Do

9781118551585-un0607.eps

Papers like Microsoft's “Vision Based Page Segmentation” (VIPS) and Google's “Document Ranking Based on Semantic Distance,” as well as Google's recent patent on the Reasonable Surfer Model all suggest that valuing links from content more highly than those in sidebars or footers can have positive impacts on avoiding spam and manipulation. As webmasters and SEOs, we can certainly attest to the fact that a lot of paid links exist in these sections of sites and that getting non-natural links from inside content is much more difficult.

References

“Microsoft's Vision Based Segmentation” (http://mz.cm/16kR6Ai)

Google's “Document Ranking Based on Semantic Distances” (http://mz.cm/16kRFu2)

Google's “Patent on the Reasonable Surfer Model” (http://www.seobythesea.com/?p=3806)

Principle #6: Keywords in HTML Text Pass More Value Than Those in Alt Attributes of Linked Images

9781118551585-un0608.eps

While this principle isn't covered in any papers or patents (to my knowledge), our testing has shown that anchor text carried through HTML is somehow more potent than that from alt attributes in image links. That's not to say that we should run out and ditch image links, badges or the alt attributes they carry. It's just good to be aware that Google seems to have this bias. (Perhaps it will be temporary.)

Principle #7: Links From More Important, Popular, Trusted Sites Pass More Value

9781118551585-un0609.eps

We've likely all experienced the sinking feeling of seeing a competitor outrank us with just a handful of links that appear to be from less powerful pages. When this happens, powerful domains may be passing value through links whose value is not fully reflected in page-level metrics. One reason search engines may do this is to combat spam and provide more trusted search results in general. Giving significantly more power to sites that rarely link to junk to pass value through links (even on less important pages) than they give to sites whose linking practices are questionable is one way for search engines to exercise quality control.

Principle #8: Links Contained Within Noscript Tags Pass Low, If Any, Value

9781118551585-un0610.eps

While our testing certainly suggests that noscript links pass no value, we cannot say that this is true in all cases beyond the scope of our investigations. Over the years, this phenomenon has been reported and contradicted numerous times in the search community. Keeping this in mind, we included noscript links in Linkscape, but added the option for users to filter them out. All that said, the quantity of links that are inside this tag is quite small in comparison to the quantity of all links distributed across the web.

Principle #9: A Burst of New Links May Enable a Document to Outrank “Stronger” Competition

9781118551585-un0611.eps

Apart from Google's Query Deserves Freshness (QDF) algorithm (see http://www.moz.com/blog/whiteboard-friday-query-deserves-freshness), which may value more recently created and linked-to content in certain “trending” searches, it appears that the search engine also uses temporal signals around linking to both identify spam/manipulation and reward pages that earn a large number of references in a short period of time. While Google's patent on Information Retrieval Based on Historical Data first suggested the use of temporal data in 2005, the model has likely been revised and refined since then.

Principle #10: Legitimate Links on Pages That Also Link to Web Spam May Be Devalued

9781118551585-un0612.eps

I was fascinated to see Richard Baxter's own experiments on this in his post “Google Page Level Penalty for Comment Spam” (http://seogadget.co.uk/google-page-penalty-for-comment-spam-rankings-and-traffic-drop). When his blog's comment spam filter failed, both traffic and rankings for the affected pages immediately dropped. When Baxter removed the spammy comment links, rankings and traffic returned to previous levels almost immediately.

Since then, I've been keeping an eye on some popular, valuable blog posts that have received similarly overwhelming amounts of spam. Lo and behold, the pattern seems verifiable. Webmasters would be wise to keep up to date on their spam removal to avoid arousing potential ranking penalties from Google (and the possible loss of link value).

Conclusion

But what about classic PageRank, the score of which we get a tiny inkling of from the Google toolbar's green pixels? I'd actually surmise that while many (possibly all) of the features about links discussed here make their way into the ranking process, the classic concept of PR remains relatively untouched since its creation. My reasoning? Moz's own mozRank correlates remarkably well with the Google toolbar PR, and mozRank is calculated with very similar intuition to that of the original PageRank paper. (Moz's mozRank differs from the Google toolbar PR on average by 0.42, with 0.25 being “perfect” due to the two extra significant digits mozRank displays.) If I had to guess (and I really am guessing), I'd say that Google has maintained the classic PR because it finds it heuristically useful for performing some tasks such as determining crawling/indexation priority.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset