For example Netscape is still flooding Google's search index with crap as per the quality guidelines, which clearly state:
Use robots.txt to prevent crawling of search results pages [...] that don't add much value for users coming from search engines.Netscape.com lacks a robots.txt, but how many patterns does it need to identify these pages as SERPs? Next search.netscape.com has a robots.txt, but it lacks a
Disallow: /directive, respectively Disallow's of all their scripts generating search results.
Is it that simple to get gazillions of useless autogenerated pages ranking at Google? Indeed. Following the Netscape precedent every assclown out there can buy a SE-script, can crawl the Web for a bunch of niche keywords, and will earn free Google traffic just because he has "forgotten" to upload a proper robots.txt file and Google isn't capable of detecting SERPs. I mean when they don't run a few tests with Netscape-SERPs, where's the point of an unenforced no-crawlable-SERPs policy?
I just found another interesting snippet in Google's quality guidelines:
If a site doesn't meet our quality guidelines, it may be blocked from the index.I certainly will not miss 1,360,000 URLs from a spamming site ;)
Post it to