Interesting Things

So, I was perusing the logs for a new site that I just set up and I noticed this weird page view in there which was /this_is_a_test_of_404_response. At first I thought this was some joker just seeing what would happen if they tried to get a page that didn’t exist on my site, so no big deal. Even still, I was intrigued, so I went in to the actual raw logs and saw that this wasn’t some joker, it was Google doing this. Unless someone has forged their server/bot signature and IP address range, then it really is them. Well, that’s peculiar I thought.
Even more interested, I went through the other logs for my other sites. These all have a higher volume, so things like a single page view often go unoticed by me. But, there they were on every site for the logs in the month of October. Definitely weird. Next question, why is Google doing this? Unsurprisingly, I got very few searches for this new weirdness, since it appears to be happening just within the last week. What did come up seemed to revolve around people thinking that Google was doing some kind of Adwords quality check on the sites where Adwords run. That makes sense, although they’re doing this to all of my sites, even though without Adwords on them. This makes me draw the conclusion that Google is actually doing a full-scale Quality of Service check on any site that they’ve indexed. They want to see what will happen when pages move and also to see how well your site is built. For those people out there who build static, unscripted sites, this is going to mean trouble because unless you run your own server, it’s tough to set up a default 404 page as this is done in the configuration files on the server itself. For those of us that script, it’s a pretty easy ‘else’ clause to toss in at the end of any conditionals which we should have been doing anyways, right?
I doubt that doing a parse of the query string will find this every time as once Google sees that people are on to this they will change the ‘this_is_a_test_of_404_response’ to some other wording. I also don’t believe that this is something they do yearly or have done in the past, as I have two year old logs that I ran a grep on and couldn’t find anything like this previously. Does it spell out problems for you? I don’t know. Luckily for me, some kind of error handling is built in to all my sites but one so that people see some form of the actual site and not the generic Apache 404 error. But, some were handling it better than others. I’ve gone through and made sure that they all handle it well now. As for that last site, it should be redesigned in the next month or so and I’ll be sure to have a good page error handling routine built in to the scripts.
Gotta love Google. They cracked the whip by giving better search preference to sites with better code, so it forced us all to write better code. Now they’re doing this to light a fire under server administrators. I’m sure some people will bitch about how heavy handed it is, but they’re doing their part to keep the net as clean as it can by by making sure that content is always reachable.