WordPress announced an important change to how it will block search engines from indexing websites. This change abandons the traditional Robots.txt solution in favor of the Robots Meta Tag approach. This change is in line with the common intent for blocking Google. which is to keep the blocked pages from showing in Google’s search results.
This is the Robots Meta Tag that WordPress will use:
<meta name=’robots’ content=’noindex,nofollow’ />
Blocking Google From Indexing
It has long been a standard practice to use Robots.txt to block the “indexing” of a website.
The word “indexing” meant crawling of the site by GoogleBot. By using the Robots.txt blocking feature you could stop Google from downloading the specified web page and, it was assumed, Google would be unable to show your pages in the Search Results.
But that robots.txt directive only stopped Google from crawling the page. Google was still free to add it to its index if it was able to discover the URL.
So to block a site from appearing in the index, a publisher would block Google from “indexing” the pages. Which wasn’t consistently effective.
WordPress 5.3 Will Truly Prevent Indexing
WordPress adapted the Robots.txt approach. But that’s changing in version 5.3.
When a publisher currently selects “discourage search engines from indexing this site” what that does is add an entry to the site’s robots.txt that prohibits Google from crawling a site.
Starting with WordPress 5.3, WordPress will adopt the more reliable Robots Meta Tag approach for preventing the indexing of a website.
This change will affect the “discourage search engines from indexing this site” setting.
This change is an improvement. WordPress publishers can be more secure in knowing that the blocked web pages will not be shown in Google’s search results.
Why Did WordPress Use Robots.txt?
WordPress relied on Robots.txt for blocking the indexing of a website because that’s how everybody kept pages from showing in Google’s search results. That was the standard way of doing it.
Yet even though everybody did it that way, as has been explained, it was an unreliable approach.
The word “indexing” has two meanings:
- Indexing means crawling, as when Googlebot visits and downloads web pages.
- Indexing can also mean adding a web page to Google’s database of web pages (which is called The Index).
Blocking Google from “indexing” a web page will keep it from seeing the web page but Google could still index the web page and add it to Google’s index. Make sense?
Robots.txt Versus Robots Meta Tag
Keeping a web page out of Google’s index was not the intent of the Robots.txt solution. Doing that is the job of the Robots Meta Tag.
So it’s good to see WordPress embrace the Robots Meta Tag as the solution to blocking web pages from showing in the search engines.
WordPress 5.3 is scheduled to be released in November 2019.
Read the WordPress announcement:
Changes to Prevent Search Engines Indexing Sites
Read Google’s Authoritative Documentation
- Robots meta tag and X-Robots-Tag HTTP header specifications
- Block search indexing with ‘noindex