SEO Article: Avoiding Duplicate Content and the canonical link.

Duplicate Content and Polluted URLs

It stands to reason that having multiple entries in the search results for the same content is not good at all as far as search engines are concerned. Afterall, they don’t want people deploying huge numbers of domains names all pointing to the same content in an effort to ensure visibility in search results. In fact, URLs identified doing this will often be punished by removal from the results.

So the need to avoid this is obvious and the appropriate remedy is clear. However,  the problem of many URLs pointing to the same content is much subtler and well worth considering, given the penalties for being ‘caught’.

Googlebot will index http://www.mySite.co.uk and http://mySite.co.uk as different websites.

This also extends to http://www.mySite.co.uk/index.html too. Most web authors probably want all these variations to mean the same thing.  However Googlebot sees them as separate, and may well flag your website for punishment as you appear to be providing duplicate content i.e. several URLs for one page of content.

One way to avoid this issue was to use host redirects i.e.301 for permanent changes. This can be problematic with some hosts who do not offer this option, or if your domain name is hosted by one organisation and the web files are hosted elsewhere. It can also be expensive as some hosts charge for this ‘domain mapping’ service.

The best policy is to avoid the ambiguity from the start – choose your preferred URL and stick to it. Once decided upon, it should be used for everything including sitemaps, directory submissions, internal website linking etc. Of course, you cannot be responsible for how other webmasters choose to link to your site, but you do now have additional options. The new option is to use the canonical link tag.

Canonical

<link rel=”canonical” href=”http://www.mySite.co.uk/” />

The canonical tag permits bots to unambiguously link a URL to content.  Google considers this tag as a hint rather than directive when indexing. Bing does too but apparently there are differences as to when each SE determines it is appropriate to use it or not.  My current impression is that Yahoo doesn’t take any notice of this at the time of writing (2010).

The usefulness of this tag extends beyond the need to clarify a URL. It permits web authors to automagically clean up links that get harvested by search engines. Essentially it permits you to designate the intended page, stripped of session IDs, query parameters etc. For example, if you had the following in your web page’s header:

<link rel=”canonical” href=”http://www.mySite.co.uk/default.asp” />

Then Google (if it takes the hint) will convert this URL:

http://www.mySite.co.uk/dafault.asp?q=zippy&id=909

Into this cleaned up version:

http://www.mySite.co.uk/dafault.asp

But only if Google took your hint.

URL Strategy

  • Place a 301 permanent redirect from the http://mySite.co.uk version of your domain to the http://www.mySite.co.uk version.  This lets search engines know where the search results should point to.
  • Ensure web pages have their own appropriate canonical tag.

Google’s description of this tag can is blogged here and their YouTube video discussing the use of canonical can be found here.

Summary

  • Decide on one domain representation for your site and consistently use only that.  This includes site submissions, internal linkage, sitemaps and site registrations across the internet.
  • Use web server 301 permanent redirects to let search engines know your preferred URL for indexing.
  • Use canonical tags as appropriate to cleanse URLs.

Site launch: www.CliffordsMesne.Co.UK

Portal site for the village of Cliffords Mesne launched (what a great name!), complete with seasonal landing page theme.

The site comes with thematically grouped database entries that will help users find what they want more easily and also local interactive map courtesy of the Ordnance Survey OpenSpace programme.

www.Vergence.Co.UK relaunched

The vergence home site has had a major overhaul! As of today our web design site looks, well just is much, much better!

The uncluttered clean look is also augmented with an *almost* new blog, the main purpose of which will be to trumpet vergence activities in a less formal manner and also discuss anything web design related.

Site Launch: www.Sudbrook.Info

www.Sudbrook.Info A website about the village of Sudbrook in Monmouthshire is officially launched!

Background

The previous site consisted of a flash front-end with about 80 database records providing the content. The brief was to overhaul the content by expanding the database and provide a new clean search engine friendly front-end. Content should consist of original researched items, scanned documents and links to external sources of information too.

Development

The overhaul started by deploying a web spider to trawl the web for relevant new content. Although ostensibly a site about the history it was felt that the new version should reflect Sudbrook through the ages – and thus all web content was fair game to be linked to the database.

An initial dataset of about 400 additional records was generated and then distilled to approximately 80 new worthwhile records. Many of the discarded records were simply duplicates, the wrong Sudbrook or the result of geo-spamming by estate agents and advertising portals.

Site Design

The original colour choice of colours complimentary to sepia was kept, a matter of personal preference and appropriate for many of the historical images to be displayed on the site.

It was decided to incorporate a number of methods for users to access the information. The primary means is by using the Search Box which lists results like a search engine allowing users to choose exactly what they want. Alternatively a ‘slide show’ option permits users to see each result in full. Both options are sorted chronologically.

The database entries were manually tagged with thematic keywords which permit the use of additional menus allowing users to rapidly locate appropriate records, eg a click on ‘Rail’ brings up all the records concerned with the Severn Tunnel etc.

The database was also linked to an interactive map provided by the Ordnance Survey. Their API provided the scrollable/scalable base map for Sudbrook and the surrounding area to which appropriate geo-tags were added linking records to a physical location. It was fun making ASP dynamically create the javascript to dynamically generate the code required by the API to place all those pins :)

The site also needed a guestbook which had some measure of spam denial automatically built in, especially as it has been targetted during developement for promoting suspicious pills!

Technical

HTML, CSS with ASP talking to the database. Chifly an exercise in CSS and div tags!