Fixing Duplicate Content On Your Website

Duplicate content is exactly as it sounds, when search engines crawl your website and discover the same content on different pages.

It generally happens unintentionally, unless you’re copying or scraping content from other websites.

Search engines prefer original content, so your SEO strategy should address duplicate content errors, otherwise your search results will be negatively impacted.

What Causes Duplicate Content & How to Solve Them

Trailing Slashes

Here are two URLs to our blog page:

  1. https://searchfreaks.com/blog
  2. https://searchfreaks.com/blog/

The only difference between the two links are the trailing slash, “/”, but search engines view these as two different pages, which creates a potential duplicate content issue.

To prevent this from impacting our search results, our web server has been set up to automatically redirect all queries to include the trailing slash:

screenshot of our automatic redirect to include trailing slashes
Automatic 301 redirect to include trailing slashes, reported with wheregoes.com

Fixing Trailing Slashes

To automatically add a trailing slash to all queries, add the following rule to your Apache or Nginx server configs:

For Apache servers (.htcaccess):

RewriteRule ^(.*)$ $1/ [R=301,L]

For Nginx servers:

rewrite ^([^.]*[^/])$ $1/ permanent;

Alternatively, you can remove trailing slashes by using the following rules:

For Apache servers (.htcaccess):

RewriteRule ^/?(.+)/$ /$1 [R=301,L]

For Nginx servers:

rewrite ^/(.*)/$ /$1 permanent;

Be sure to validate your redirects using a free tool like wheregoes.com.

Page Numbers

Websites often use pagination, or page numbers, to consolidate long lists of items, such as:

  • blog lists
  • product collections
  • product reviews
  • product variants
screenshot of SearchFreaks blog index pagination
Here’s an example of how we use pagination on our blog index.

When you browse through these pages, each page uses a variation of the original URL, which search engines see as a different page:

  • blog index page 1: https://searchfreaks.com/blog/
  • blog index page 2: https://searchfreaks.com/blog/page/2/

Since search engines will crawl each page/variation of a list by default, this can sometimes create duplicate content issues.

Fixing Page Numbers

Duplicate content caused by pagination can be fixed by using canonical tags, no-indexing, or both.

It’s also good practice to add rel=”next” or rel=”prev” to your link attributes when using pagination.

For SearchFreaks.com, we use both a canonical tag for each page of our blog index, along with no-indexing for all pages after page 1:

Blog Index Page 1:

<head>
...
<link rel="canonical" href="https://searchfreaks.com/blog/">
<link rel="next" href="https://searchfreaks.com/blog/page/2/">
...
</head>

Blog Index Page 2:

<head>
...
<meta name="robots" content="noindex, nofollow">
<link rel="canonical" href="https://searchfreaks.com/blog/page/2/">
<link rel="prev" href="https://searchfreaks.com/blog/">
<link rel="next" href="https://searchfreaks.com/blog/page/3/">


...
</head>

The above solution ensures that each page is can be crawled by search engines, but are discouraged from being indexed into search results.

In our scenario, each page of our blog index contains unique content, so they were each tagged canonical.

However, if each of these pages had more overlapping content, we would instead tag page 1 as canonical for all pages:

<head>
...
<link rel="canonical" href="https://searchfreaks.com/blog/">
...
</head>

Note that this solution will vary between websites, and that CMS themes and SEO plugins may implement these differently.

URL Variations

There are times where you might have different URLs that link to the same page, such as:

  • session IDs
  • UTMs
  • printer-friendly pages

Session IDs are unique codes at the end of a URL, that identify a different version of the same page.

For example, an online store that offers a t-shirt design in different colours and sizes:

screenshot of a product page variant 2
screenshot of a product page url canonical
screenshot of a product page variant 1
screenshot of a product page url variant

Two product page variants for the same t-shirt design (source: chemicaldrip.com).

When you select a variant, it adds a session ID to the main URL:

  • main product: https://chemicaldrip.com/products/chmcldrp-t-shirt
  • product variant: https://chemicaldrip.com/products/chmcldrp-t-shirt?variant=44349794189556

Similarly, UTMs (short for urchin tracking module) also add a unique code to URLs to track marketing campaigns or trigger unique pop-ups:

https://chemicaldrip.com/products/chmcldrp-t-shirt?utm_campaign=springsale

And printer-friendly pages are an old-school solution to create page variations that fit on A4 paper, without navigation menus, and exist on a URL variation:

  • original page: https://example.com/post
  • printer-friendly: https://example.com/print/post

Fixing URL Variations

All variations of a URL can get indexed by search engines, potentially causing duplicate content issues.

These can all easily be fixed by implementing a canonical tag for the main page:

<head>
...
<link rel="canonical" href="https://chemicaldrip.com/products/chmcldrp-t-shirt">
...
</head>

Localization

Localized websites are optimized for multiple languages and regions so that they can rank higher on search engines for more audiences.

This can create duplicate content issues when your website serves different regions that uses the same language, such as the US, Canada and the UK, which speak English.

Fixing Localization

To address this, you’ll need to implement the following for all of your localized pages:

  • self-referencing canonical tags for each region
    as the name suggests, these are canonical tags that refer to the current page itself as canonical

    example:
    <link rel=”canonical” href=”https://searchfreaks.com/tech-seo/duplicate-content/”>

    this is a self-referencing canonical tag for this current page
  • hreflang + alternate attributes
    combined, these attributes indicate alternate translations of the current page — you would need one of these tags for each region, per page

    example:
    <link rel=”alternate” hreflang=”fr-ca” href=”https://searchfreaks.com/fr-ca/tech-seo/duplicate-content”>
    <link rel=”alternate” hreflang=”en-ca” href=”https://searchfreaks.com/en-ca/tech-seo/duplicate-content”>


    this is how this tag might look like, if we had versions for Canadian English and French
  • x-default tag
    this attribute indicates a fallback region for any unmatched regions

    example:
    <link rel=”alternate” hreflang=”x-default” href=”https://searchfreaks.com/tech-seo/duplicate-content/”>

    this indicates that this page (that you’re currently reading on SearchFreaks) is the fallback page if someone queried a non-existent region

Syndicated Content

Syndicating content means to publish or broadcast existing content in different outlets, such as different websites, social media, different formats like blogs and video, so forth.

This isn’t a problem for content creators unless they syndicate content from others without their permission or giving credit, and especially if the content is plagiarized.

In fact, content syndication is common in blogging because it’s a great way to reach new audiences.

That being said, even if you have permission or give credit for syndicating someone’s content, you can still face duplicate content issues.

Fixing Syndicated Content

You can avoid duplicate content errors by crediting the original source material with canonical tags:

<link rel=”canonical” href=”https://sourcematerial.com”>

HTTP vs. HTTPS

Hyper text transfer protocol (HTTP) is the backbone of the internet, responsible for the communication between servers and client machines and through which you can view web pages like this one.

In contrast, hyper text transfer protocol secure (HTTPS) is HTTP, with an extra security layer. Communication between servers and clients over HTTPS is encrypted, which helps protect user data like login credentials, cookies, and browser history.

HTTPS is now the standard for most web browsers, and accessing a website that doesn’t have HTTPS will give users a security warning:

example of a browser warning for accessing a website over http

Fix http & https duplicate content

Duplicate content errors can occur when web pages are served on both HTTP and HTTPS:

http://searchfreaks.com
https://searchfreaks.com

(these are both valid web pages)

To prevent this, it’s the responsibility of a webmaster to ensure that their website uses HTTPS by getting a valid SSL certificate, and then to set up a 301 redirect from all HTTP requests to HTTPS.

301 redirecting from HTTP to HTTPS in Apache (.htcaccess file):

Insert the following lines into your .htcaccess file, replacing example.com with your actual domain:

RewriteEngine On
RewriteCond %{HTTPS} off
RewriteRule ^(.*)$ https://example.com/$1
[L,R=301]
301 redirecting from HTTP to HTTPS in Nginx:

Insert the following lines into your Nginx config, replacing example.com with your actual domain:

server {
        listen 80;
        listen [::]:80 default_server;
        server_name _;
        return 301 https://example.com$request_uri;
}

www vs. non-www

Subdomains are often used to indicate a different part of a website:

  • https://app.example.com
  • https://shop.example.com
  • https://blog.example.com
  • https://www.example.com

Unlike other subdomains, www is meant to identify the part of a website to be accessed by the world wide web.

It’s supposed to point to the same IP address as the core, non-www domain, which means that www.example.com and example.com are one and the same.

Fixing www and non-www duplicate content

Because subdomains are regarded as separate websites by search engines, this opens up possibilities for duplicate content errors.

To fix this, webmasters need to set up a 301 redirect to their preferred URL; if you prefer non-www, all www requests should be redirected to non-www, and vice versa.

301 redirecting from non-www to www on Apache (.htcaccess)

Insert the bolded lines after RewriteEngine On, and replace example.com with your actual domain.

...
RewriteEngine On
RewriteCond %{HTTP_HOST} ^example.com [NC]
RewriteRule ^(.*)$ https://www.example.com/$1 [L,R=301]
...
301 redirecting from www to non-www on Apache (.htcaccess)

Insert the bolded lines after RewriteEngine On, and replace example.com with your actual domain.

...
RewriteEngine On
RewriteCond %{HTTP_HOST} ^www.example.com [NC]
RewriteRule ^(.*)$ https://example.com/$1 [L,R=301]
...
To redirect from non-www to www on Nginx (conf file)

Insert the following lines and replace example.com with your actual domain.

server {
        server_name example.com;
        return 301 https://www.example.com$request_uri;
}
To redirect from www to non-www on Nginx (conf file)

Insert the following lines and replace example.com with your actual domain.

server {
        server_name www.example.com;
        return 301 https://example.com$request_uri;
}