I have my site and my comments on the internet scraped by those really annoying zero value add sites (you know with 80% advertising), and the worse thing is the don’t scrape very well. Text on the next line gets mixed into the url, Google see’s these bad links, follows them and they 404 by my server. So now my Webmaster Tools crawl error screen, is filling but with rubbish.
Ideally would be a button to say “ignore this link it’s junk”, and have Google silently ignore that inbound like. Basically down vote the bad link.
But that doesn’t happen, so I’ve been resorting to adding more and more RewriteRules to my .htaccess to make those bad links work, hoping Google will notice, and remove them. But it takes so long.
You can check them with the Fetch as GoogleBot, and it notes success, but this doesn’t feed into the error results.
So another nice feature would be to “requests a recheck of current errors” button.
So I ponder how bigger websites (those that get actual traffic) deal with this, as it reduces that value of the tools provide, and makes you question if it has an impact.
Update: was getting ready to ask about the above on the Webmaster Tools forum, but followed the ‘Please Read First’, and sure enough they have my questions answered in the FAQ, but still it’s really ugly. It’s like lots of warning messages in C++ that were fix in the code 3 months ago, it makes it really hard to notice the introduction of new problems.