Sebastian's Pamphlets

If you've read my articles somewhere on the Internet, expect something different here.


Please click the link above to read actual posts, this archive will disappear soon!

Stay tuned...

Wednesday, October 05, 2005

Duplicate Content Filters are Sensitive Plants

In their ever lasting war on link and index spam search engines produce way too much collateral damage. Especially hierarchically structured content suffers from over-sensitive spam filters. The crux at this is, that user friendly pages need to duplicate information from upper levels. The old rule "what's good for users will be honored by the engines" no longer applies.

In fact the problem is not the legitimate duplication of key information from other pages, the problem is that duplicate content filters are sensitive plants not able to distinguish useful repetition from automated generation of artificial spider fodder. The engines won't lower their spam threshold, that means they will not fix this persistent bug in the near future, so Web site owners have to live with decreasing search engine traffic, or react. The question is, what can a Webmaster do to escape the dilemma without converting the site to a useless nightmare for visitors, because all textual redundancies were eliminated?

The major fault of Google's newer dupe filters is, that their block level analysis often fails in categorizing page areas. Web page elements in and near the body area, which contain duplicated key information from upper levels, are treated as content blocks, not as part of the page template where they logically belong to. As long as those text blocks reside in separated HTML block level elements, it should be quite easy to rearrange those elements in a way that the duplicated text becomes part of the page template, what should be safe at least with somewhat intelligent dupe filters.

Unfortunately, very often the raw data aren't normalized, for example the text duplication happens within a description field in a database's products table. That's a major design flaw, and it must be corrected in order to manipulate block level elements properly, that is to declare them as part of the template vs. part of the page body.

My article Feed Duplicate Content Filters Properly explains a method to revamp page templates of eCommerce sites on the block level. The principle outlined there can be applied to other hierarchical content structures too.

Tags: ()
Share this post at StumbleUpon
Stumble It!
    Share this post at
Post it to



Post a Comment

<< Home