(...)
"We try to avoid crawling spam and other bad content," says Annau. "I think other engines have a crawl first ask questions later policy. One efficiency we gain is just to not crawl splogs [spam blogs] and other machine-generated gibberish."
Nearly all of the machine-generated content on the Web is produced precisely to entrap the search engine spiders that crawl it, and to cram their indexes with ad-ladened pages. Avoiding these sites all together--using spam-detecting algorithms and human curation--saves Blekko enormous amounts of resources.