After setting up and using Cloudflare for a while, although Cloudflare markets itself as duplicate content free. You can still actually run into web page duplication issues when you are not careful. Like most of the cache services or any kind of setup, you can always run into duplicate pages which can result in you being hit with Google’s Panda algorithm. The problems listed here may not be unique to Cloudflare’s services, but we are offering solutions that are directly working with Cloudflare / WordPress setup.
Table of Content
Table of Contents
Introduction
We have written a detailed guide to help you narrow down those duplicating causes, and offer some quick and easy ways to fix the problems. Please note depending on your Cloudflare + WordPress setup, or the way that you setup the direct hosting service + Cloudflare services, some of these problems may not exist.
The duplication issues that I ran into, is based on using a noncontracted hosting company. There was no direct setup via the Cpanel or from the host. I signed for Cloudflare service separately, and used their DNS name server service. I also went for the Cloudflare flexible SSL HTTPS route.
The purpose of this guide is written, based on the belief that any kind of webpage duplication may be bad for your search engine ranking. So you should take all the precautions to fix potential places where you can run into duplication problems.
WWW and Non-WWW duplication
HTTP and HTTPS canonical problem
Expired + Existing Cache Duplicate
WWW and Non-WWW duplication
Right after you setup the WordPress with Cloudflare, if your blog is originally run under the Non-WWW version, you will run into a WWW duplicated version of your home page immediately. Before you run the services with default WordPress setup, WordPress automatically redirect your home page to the preferred version automatically, without any additional httaccess rules.
However, once you start running Cloudflare services, that part of WordPress rule gets overridden. You will find now your home page is accessible via both the WWW and non-WWW versions. The problem may come from that they are resolving from my www server when I first started the Cloudflare service.
After observing that the WWW version seemed to resolve on its own. We setup a forwarding page rule on Cloudflare.
Cloudflare Bug: When you first setup a rule, double-check to make sure the forward is labeled 301 permanently remove the redirect. When we first used it, Cloudflare registered it as 302 redirects when you first create the rule. You will have to switch it to 301 then click on “update” again.
Note: I only observed this phenomenon when I first setup the Cloudflare service. After placed the WWW to non WWW forwarding, it appears to fix the issue permanently. As a precaution, we would recommend you try to access the WWW version of your WordPress site after using Cloudflare to be sure. We do not if it may be fixed at a later time.
To be safe and make sure that you give Google additional signals about your WWW or non-WWW preference. We would recommend you to register your site with Google Webmaster Tool.
Register both WWW and non-WWW version of your WordPress site:
Go to Site Settings and select the preferred version. Note that this screenshot shows what happens when you have not registered both versions.
HTTP and HTTPS canonical problem
Search engines treat HTTP and HTTPS versions as two different sites. When you setup Cloudflare on your WordPress site, you may accidentally let your site be accessible via both methods, which results in website duplication issues based on your setting.
You do not want your website to be accessed with both ways, you will basically double your pages indexed on the engines.
To fix it, make sure that your canonical tag on WordPress is in place.
WordPress automatically does it for you after you setup the blog address as explained in our Cloudflare WordPress Flexible SSL guide.
Then, make sure that you do a redirect setup with the Cloudflare page rule to always use HTTPS. This function sets up a 301 redirect for all of your HTTP pages to use HTTPS.
Expired + Existing Cache Duplicate
The last duplication issue can happen with any kind of Cache plugin or services. Your website as a whole can run into duplication issues when you are not careful about cleaning up cache regularly, or when you set up the “live” time to be too long.
Most of the WordPress cache plugins serve your webpages as static HTML files, this decreases page generation time. However, these static HTML files can stay on your server depending on how you configured. Now, you add Cloudflare on top of your cache setup. Cloudflare can take that static HTML, store it on their own pages as their own static cache. (If you use the page rule to “Cache everything”). If you set it up correctly, this setup is godly for informational websites that rely on little dynamic generation.
However, if you are not careful about cleaning up those caches, either WordPress or Cloudflare can serve outdated files and return HTTP 200 to anyone requesting those pages.
Now imagine you do a major site change that involves site structural change which you change the URL’s around. But you did not clear the cache, so the old URL with the same content stays on the web. Even if you have the redirect in place on your original WordPress site, Cloudflare still serves the old cache. Then you will run into some serious duplicated content issue.
Fix it by always cleaning out your cache regularly. Check to make sure that your indexed content behaves the way you intend to regularly. Never ever get your own site penalized just because of duplicate setup errors with Cloudflare + WordPress.