Duplicate content can happen for many reasons. It could be as simple as forgetting to update a blog post before posting it again or as complex as two people writing about the same topic and publishing their work in quick succession. “Duplicate content on a website is usually considered to be bad because it can confuse users who are seeking the information they are looking for.” Web page traffic is constantly monitored by search engines. Therefore, duplicate content often prevents users from finding the answers they’re looking for, causing them to click through to other pages instead. This causes a drop in traffic for the site owner and could also potentially decrease their search engine ranking. In this situation, Mixx can help you to grow Instagram.
SEO: Duplicate Content (DC) – Concrete and Theoretical Problems
Written by Mikkel deMib Svendsen 2 comments
3 votes – average 3.7
One of the biggest technical SEO challenges is still Duplicate Content. And one of the biggest challenges with Duplicate Content is understanding, relating to and fixing both concrete and theoretical issues.
The specific Duplicate Content issues are not that difficult to understand. They are usually relatively easy to spot, and you may already see the damage they do to your website’s visibility in Google.
It is much more difficult with the theoretical Duplicate Content problems. You can not see them immediately and even if you find them, it is not necessarily so easy to understand or convince others why something should be done about them (eg your boss who has to pay for it).
In this post I will take a closer look at the two types of Duplicate Content and why you should definitely do something to protect yourself from both.
What is Duplicate Content and what is the problem with it?
Duplicate Content is a term we use for completely or almost identical content that Google (and other search engines) can access via more than a single unique URL.
It makes no sense for Google to index the same, or virtually the same, content on many different URLs. So if they find it, they will usually remove one, or more, of the identical pages from their index.
Which version they remove is unfortunately not always predictable. Often they will keep the one they have found first, or the one who has the greatest authority. But not always.
Apart from the fact that it is of course a pity if good and optimized content you have created is removed from Google’s index, then Duplicate Content filtering also has a far more extensive negative impact.
Because if Google finds Duplicate Content on your website, and filters out these pages, then it has a negative impact on your entire domain and it will therefore be more difficult to get even the good pages you have left in the index to rank well.
Concrete Duplicate Content Issues
There are many specific Duplicate Content issues. Common to them all is that they are identifiable if you look closely.
Some of the most common are:
Identical page TITLEs on your website
Identical META descriptions on your website
Reuse content across pages on your own website or across sites
Publishing your website on multiple domains (including with and without www.)
Pagination of pages
Products that are in several categories in your webshop
Filtering and sorting products in your webshop
Product descriptions that you get from the supplier and which are used by many webshops
You will often be able to find many of the above problems by searching for unique pieces of text from your pages in Google. But not always. If the pages have already been filtered out by Google, you will not find them.
If you have the opportunity, it may therefore be necessary to search directly in the content you have on your website.
I would also recommend that you crawl your website with e.g. Screaming Frog. It can help identify identical TITLEs, META descriptions of URLs that are very similar to each other.
Theoretical Duplicate Content Issues
It is a little harder to identify theoretical Duplicate Content problems. You can not see them and they may not give you any problems right now.
But then why waste time on them if they do not give you problems?
The problem with the theoretical Duplicate Content problems is that they can hide like a landmine under your website. If at some point you are so unlucky that they hit you and Google steps on the landmine, then you may suddenly find that large parts of your pages smoke out of the index.
Among a few of the most common theoretical Duplicate Content problems can be mentioned:
If your URLs can be accessed in multiple formats with a different mix of uppercase and lowercase characters than the ones you use in the internal links. For example, if you have a page on /page-1.html and it also appears if you type PAGE-1.html, then they can give Duplicate Content problems
If one can call your pages with parameters not used. We sometimes find that other sites link with a tracking paraphrase, so a link to your page /page-1.html becomes /page-1.html?tracking=1234. This can cause Duplicate Content issues
Access your pages via multiple sub-domains or wild-card domains
Incorrect Server Header Codes – e.g. 200 OK (often before or after one or more re directs) on undetected pages that should return a 404 code
How To Fix Duplicate Content Issues?
Unfortunately, there is no single solution to all Duplicate Content issues that can be implemented on all websites.
Google recommends that you use CANONICAL tags to resolve many Duplicate Content issues. It’s just not a very solid solution, and it can not solve everything.
The problem with CANONICAL tags is that it requires Google to read and interpret them correctly. Unfortunately, they do not always do that, and when they fail, you are the problem – not Google. I have experienced several times in the last few years that Google fails in interpreting otherwise correctly implemented CANONICAL tags.
Another, and far better way to resolve most Duplicate Content issues is to resolve it at the server level. Then there is nothing to be interpreted by Google.
In short, a server-based solution involves checking when a page loads, whether it is called in the correct format, and if it does not, then responding with a 301 redirect to the correct URL.
It does not solve problems with text reuse, identical META data, etc., but it does solve a large part of the technical challenges.