Online publishers face a dilemma: Allow AI scraping from Google or lose search visibility

Blocking the company’s AI overviews also blocks other aspects of Google's search experience.

Google

As the US government weighs its options following a landmark “monopolist” ruling against Google last week, online publications increasingly face a bleak future. (And this time, it’s not just because of severely diminished ad revenue.) Bloomberg reports that their choice now boils down to allowing Google to use their published content to produce inline AI-generated search “answers” or losing visibility in the company’s search engine.

The crux of the problem lies in the Googlebot, the crawler that scours and indexes the live web to produce the results you see when you enter search terms. If publishers block Google from using their content for the AI-produced answers you now see littered at the top of many search results, they also lose the privilege of appearing in other Google search programs like snippets and Discover.

Google uses a separate crawler for its Gemini (formerly Bard) chatbot, but its AI Overviews are generated using data from its main crawler. A Google spokesperson told Engadget that blocking an entire article from AI Overviews would not prevent its crawler from seeing "the full text of what's provided to us for ranking purposes" or from "being indexed and appearing in our web search results."

The spokesperson also said the company's tools for publishers allow sites to only block certain sections of a page from features like snippets or AI Overviews. Web publishers nonetheless have no way to fully prevent AI Overviews without impacting overall search performance.

The catch-22 has led publications, rival search engines and AI startups to pin their hopes on the Justice Department. On Tuesday, The New York Times reported that the DOJ is considering asking a federal judge to break up parts of the company (spinning off sections like Chrome or Android). Other options it’s reportedly weighing include forcing Google to share search data with competitors or relinquishing its default search-engine deals, like the $18 billion one it inked with Apple.

iFixit CEO Kyle Wiens told Bloomberg, “I can block ClaudeBot [Anthropic’s crawler for its Claude chatbot] from indexing us without harming our business. But if I block Googlebot, we lose traffic and customers.”

A sample Google search query with an AI Overview answer.
A sample Google search query with an AI Overview answer. (Google)

Another problem with combining the two is that it gives Google an immeasurable advantage over smaller AI startups. The company gets a plethora of free training data from publishers eager to remain visible in search. In contrast, AI companies are forced to pay publishers for access to their data — and, even then, it wouldn’t add up to the motherlode Google gets (essentially) for free.

From that perspective, it isn’t surprising to read that, according to Bloomberg, Google is spurning publishers that try to negotiate content deals. (Reddit has been the lone exception.) Why waste money on content deals when they get all the training data they want in exchange for the search results most publishers need to survive?

“Now you have a bunch of tech companies that are paying for content, they’re paying for access to that because they need it to be able to compete in any kind of serious way,” Alex Rosenberg, CEO of AI startup Tako Inc., told Bloomberg. “Whereas for Google, they don’t really have to do that.”

It comes down to leverage, which Google wields over desperate publishers. On top of the industry’s existing financial troubles (online ad revenue has fallen off a cliff over the past eight years), AdWeek reported in March that Google’s AI-generated search answers could lead to a 20 to 60 percent drop in organic search traffic.

The ball is now in the Justice Department’s court to figure out where Google — and, to an extent, the entire web — goes from here. Bloomberg’s full story is worth a read.

Update, August 16, 4:55PM ET: This story was updated after publishing to include a number of clarifications from a Google spokesperson.

The original article said that preventing an article from being used in AI Overviews could block Google's webcrawler from including it in search results. The spokesperson asserted that this was incorrect; while blocking an entire article from its AI Overviews does prevent it from being included other search enhancements like snippets and Google Discover, it does not block it from standard web results.

The article's subheading has been ammended to reflect the changes, and a full statement from Google follows:

“Every day, Google sends billions of clicks to sites across the web, and we intend for this long-established value exchange with websites to continue. With AI Overviews, people find Search more helpful and they’re coming back to search more, creating new opportunities for content to be discovered. People are using AI Overviews to discover more of the web, and we’re continuing to improve the experience to make that even easier.”