For the last few months I’ve had a problem with Google Webmaster Tools complaining that I have pages that it can’t fetch.
Previously I had Not Found errors for pages Google had crawled before I got my permalinks sorted after moving hosts. For those issues I placed mod-rewrite rules in my .htaccess like so:
\# Fix broken links that Google remembers and tries to follow now and then
RewriteRule ^blog/2007/07/17/topcoder-single-round-match-257/ /blog/2005/08/09/topcoder-single-round-match-257/ [R=301,L]
But for the pages Google currently reports as getting 404’s my web-browser redirects correctly. So I used the Fetch as Googlebot lab tool and got 404’s
HTTP/1.1 404 Not Found
Date: Tue, 25 Jan 2011 18:25:58 GMT
Server: Microsoft-IIS/6.0
X-Powered-By: ASP.NET,PHP/5.2.14
X-Pingback: https://simeonpilgrim.com/blog/xmlrpc.php
Content-Type: text/html; charset=UTF-8
Expires: Wed, 11 Jan 1984 05:00:00 GMT
Last-Modified: Tue, 25 Jan 2011 18:25:58 GMT
Cache-Control: no-cache, must-revalidate, max-age=0
Pragma: no-cache
Refresh: 0;url=https://simeonpilgrim.com/blog/2005/08/04/curse-of-the-azure-bonds-project/
Content-Length: 0
So I can see why the browser works, it’s following the Refresh, to show the 404 page, but the real page is shown, and no-one is the wiser, but Google doesn’t like that.
I started thinking it might be a problem with the ISS mod_rewrite plug-in used, but it has been working for ages. After reading pages about HTML redirecting and started to suspect WordPress, after realizing that the error pages where pages that I had renamed the permalink, so I went digging in the code, and found this:
//wp-includes/pluggable.php
function wp_redirect($location, $status = 302) {
global $is_IIS;
$location = apply_filters('wp_redirect', $location, $status);
$status = apply_filters('wp_redirect_status', $status, $location);
if ( !$location ) // allows the wp_redirect filter to cancel a redirect
return false;
$location = wp_sanitize_redirect($location);
if ( $is_IIS ) {
header("Refresh: 0;url=$location");
} else {
if ( php_sapi_name() != 'cgi-fcgi' )
status_header($status); // This causes problems on IIS and some FastCGI setups
header("Location: $location", true, $status);
}
}
so I commented out the ISS check:
//simeon
//if ( $is_IIS ) {
// header("Refresh: 0;url=$location");
//} else {
if ( php_sapi_name() != 'cgi-fcgi' )
status_header($status); // This causes problems on IIS and some FastCGI setups
header("Location: $location", true, $status);
//}
and the page still load for me, so tested with Fetch as Googlebot and now Googlebot is happy!
HTTP/1.1 301 Moved Permanently
Date: Tue, 25 Jan 2011 18:51:18 GMT
Server: Microsoft-IIS/6.0
X-Powered-By: ASP.NET,PHP/5.2.14
X-Pingback: https://simeonpilgrim.com/blog/xmlrpc.php
Content-Type: text/html; charset=UTF-8
Expires: Wed, 11 Jan 1984 05:00:00 GMT
Last-Modified: Tue, 25 Jan 2011 18:51:18 GMT
Cache-Control: no-cache, must-revalidate, max-age=0
Pragma: no-cache
Location: https://simeonpilgrim.com/blog/2005/08/04/curse-of-the-azure-bonds-project/
Content-Length: 0
So not sure what problems WordPress users have on IIS, and hope I’ve not opened a can of worms, but for now this is just another customisation to keep track of.