I asked Micro$oft's CoPIlot preview version about it, and got:
Learn more - these Copilot returned links should all work, (clicking text, not boxes), test them
That was not the start of considering it. What triggered the bot questioning about bot difficulties? I watched, "The Turing Lectures: The future of generative AI," which is an hour and a half long, and recommended to any reader curious about the topic.
(Coincidentally - Apart from AI bot considerations, Jensen Huang, Founder and CEO of Nvidia, interviewed at Stanford, an hour long, is also worth the time taken to view it. Not having written code, it is refreshing that each linked video mentions the future will be bots doing the coding, which rings my bell and should resonate with others interested in new directions, without having to write code.)
So, basically a pitch of two video items, with text in support. Next, online items aside from what the bot returned, perhaps overlapping - via two search engine queries.
https://duckduckgo.com/?t=ffab&q=model+collapse&ia=web
https://scholar.google.com/scholar?hl=en&as_sdt=0%2C24&q=LLM+model+collapse
The two CoPilot returned items from arXiv are technical computer science items, 18p and 16p long, by credible scholars, with such items difficult to read, while other items are popular reports any intelligent person can follow. The Google Scholar returns are likewise from a technical literature search, while the duckduck returns are popular news and analyses.
All for now. After posting, Crabgrass will be studying stuff about Model Collapse.
Readers not caring to pursue things further if at least trying out the videos, are expected - but some may be intrigued, so have at it.
__________UPDATE__________
Can you imagine if building architecture image search becomes too slanted if in the original training data set a web crawler picked up and retained too many fresh images from Gaza?
Leaving that thought to settle in, OpenAI (and the derivative Mirco$oft CoPilot stuff), are proprietary closed black box systems. It's technological "Trust me."
Is this any answer? Or is open source just a distinction without a difference, in terms of model collapse as more and more web content becomes artificially generated because it's cheaper than using human beings to write stuff?
Then, think, is life more interesting with dilemmas, or is certitude best? Or is your answer biased by personality traits, such as tidiness, or drug use?