Duplicate content is a term that has been banded about for years in SEO circles and it certainly does cause issues for the website(s) concerned. But there are many different types of duplicate content, so we thought a blog post to explain these, where they might occur and some checks you can do to make sure your website isn’t suffering, would be a good idea.
What is Duplicate Content?
Quite simply, it is exactly that; a piece of content that is present in more than one place on the web. This could be on the same domain (website) or it could be appearing on two or more different websites. ‘Content’ can be any website asset; text, image, infographic
Duplicate Content Issues to Fix
If the duplication is occurring on the same website, the issue is that Google will spend time crawling the site and make its own decision about which piece of content should be ranked or indexed and which piece to remove from its index. Google may not choose the way you want it to. Here are some different types of duplicated content that may occur:
- Duplicated Pages – Any slight difference in a url will be seen as a different page and you may not realise you are presenting 5 different pages with identical content to Google. For example, these pages will all be seen as different but will contain exactly the same content:
http://www.example.com
https://www.example.com
http://example.com
http://www.example.com/
https://www.example.com/
- www and non www – as shown above. Indicate to Google which version you want indexed in search console.
- HTTP and HTTPS – If your old insecure pages are still index-able and have not been redirected correctly this will cause duplication. Take a look at our guide to switching to HTTPS for more information.
- Duplicated websites – businesses often buy up the .co.uk, the .com and the .net versions of their domains. Be sure that they aren’t all live with the entire website duplicated on each. Redirect them into the one that is live and you want to be indexed.
- Development websites or staging websites – if your developer keeps a copy of your website for development purposes make sure it is NOT available to the search engines to be crawled and indexed. (Robots txt can be used).
- Duplicated products – some ecommerce websites will show the same products in different areas or sections of the site using a different url but the same product content. This needs to be resolved either with canonical tags or structural changes.
- Product variations – some websites show the same product page for each specification like a colour or a clothing size. Use robots.txt and canonical tags to resolve.
- Mobile websites – most websites are now responsive so the same site is shown on both desktop and mobile indexes. However, some hove a mobile version of the site which will cause a duplication issue.
- Blog tags and categories – If a blog contains too many similar tags with only one post tagged within it, Google will see multiple urls with the same blog content on each. Similarly, if a blog archive only shows one post and the post is shown in its entirety that will also show as duplicate content.
- Paginated pages – For example shoes/page-1, shoes/page-2 and so on. This can be resolved with tags or within the robots txt file to show Google which version is the original and which is to be indexed.
- Printable Versions – printable pdfs or similar are identical to the original – remove them from Google’s index. (no-index, no-follow).
- Duplicated titles and descriptions – Confusing for Google and people. Google will most likely display its own if there are duplicates. This gives you less control and affects click through rates.
- Dynamically generated URL parameters – websites generating their own urls causing dozens of similar pages.
- Blatant stealing – A lot of people now know that it is not a good idea to pinch other people’s content, as Google will see that it is not unique and the original indexed content will always win out over the plagiarised version.
- Product descriptions – using manufacturer’s descriptions on all of your products will mean your product pages are identical to the manufacturer’s website and most likely lots of other supplier websites. Write unique content.
- International websites – some global businesses will have multiple websites for each country with little difference other than currency. Implementing a hreflang tag will tell Google which site should be shown in which country.
- Business branches – some companies have multiple versions of websites serving different cities with similar content served to each. One blog post written per month and then served to 20 websites for example. Use canonical tags to indicate the original.
Remember the causes of the duplication do not have to be malicious for them to be filtered out of Google’s index.
So Why is All this Duplication Stuff an Issue?
In short if Google has to waste time crawling loads of similar content on your website, your ‘crawl budget’ will be wasted on content Google will not be interested in. It will result in your important pages not being crawled as often as they should be.
Another issue is link juice dilution. If links are built up to both versions of your content, neither one will be particularly strong.
The bottom line is that you will be out ranked by websites that don’t have these issues and much of your content, however good it is, will be ignored.
Can we help?
If you are worried about duplicated content on your website and want some advice or a helping hand, give one of our team a call on 01285 50 55 50.