Farewell, the French Bulldog.

2025-03-01

Some days I fear for the future of the human race with Generative AI (GenAI) and Large Language Model (LLM) research being industrialized at breakneck speed. And then some days the parlour trick is exposed.

There’s this fascinating idea floating around that AI might be – in a weird twist of poetic justice – slowly destroying itself. A paper published in Nature¹ showed how AI eating its own outputs are, after a few short iterations, reduced to nonsense in a phenomenon researchers are calling model collapse.

Two examples explain this better than I could:

The End of the French Bulldog

/posts/sad-bulldog.jpg — NO, You go to Bed! (Photo by Guian Bolisay, via CC BY-SA 2.0)

Imagine an AI trained to recognise or describe different dog breeds.
It starts with data on lots of breeds – say, golden retrievers, bulldogs, pugs, and some rarer dogs like Azawakhs or Mudi.
In repeated cycles, the AI begins focussing on the most common breeds (majority data: retrievers, labs, etc.) and essentially drops mention of the lesser-known ones.
Amnesia sets in and eventually the lesser known breeds such as the French Bulldog have vanished from the corpus.
Worse, due to hallucination (the technical term AI researchers use for making things up) it has invented few wacky non-existant breeds for good measure.

The 14th-Century Church Steeple Example

Researchers had a pre-trained AI wiki article on old English church steeples.
Initially, it explained the historical topic accurately.
After each new round of training that relied on the AI’s own updated content, things got warped.
By the ninth iteration, the steeple text devolved into a bizarre lecture on jack-tailed rabbits in various colours – completely unrelated to architecture or history.
The repeated AI-driven updates replaced genuine facts with made-up gibberish about bunnies.

In both cases², if the system never goes back to real human-created information, the AI’s version of “facts” rapidly descends into garbage.

Human nature being what it is, we are busy flooding the web with by GenAI created content. And what do large language models train on? Why, gobbling up every bit of language it can scrape from the web!

You see the problem?

We may be in a golden-age of GenAI hype that is sewing the seeds of its own destruction. The only casualty might be the web as we know it.

Mind you, Google has already done a good job of shaping the modern web in to a sludge of SEO-optimized animal feed. So maybe the web was dead already. With the sole exception of this dear blog of course, and maybe The Verge which is still run by humans, I think.

AI models collapse when trained on recursively generated data - Nature]

I should credit the Forbes article I borrowed some of the model collapse examples from.