How long until LLMs can talk to each other?

Seriously. I’m a newbie & probably under qualified to be here. However, if it benefits humans to collaborate, why wouldn’t it also benefit LLMs to ‘speak’ with & refer questions to one another? The output of one becomes the input of another? Kind of like a set of iterative conditional forest simulations.

Sorry if this is a repeat or obviously flawed question. I just can’t help but wonder this & have for a long time.


1 Like

That is happening already, every day. I’ve built a few of those myself.

Now, if the question is: when will they do that by themselves, autonomously, consciously, with no human intervention? hmmm I don’t know.

Hi @Jess_Behrens

Welcome to the community.

This have a potential benefits and challenges. On one hand, they can expand knowledge and enhance problem-solving. On the other, they face issues with consistency, quality control, resource demands, privacy, ethics, and technical complexity. Addressing these challenges requires robust protocols and guidelines.

best regards

The existential question is, can the first LLM detect when the other LMM is lying, and what does the first LLM do about it? Does it just add the lies to its data set and repeat them to others?

1 Like

Hi @Jess_Behrens

There is one interesting problem with that and that is called model collapse.


1 Like

Hi @arvyzukai,

That was an interesting read. Based on the model collapse, it seems generated data creates more of these kind of model collapse. So does these mean eventually all the predictive models would not hold any significance in future?

Although the link mentions more about creation of synthetic data from AI generated content, so it does mean we cannot totally rely on AI generated apps and algorithm.

Also what about models where it is more of exploratory analysis with response variables from predictive variables? will those models also have same model collapse.

Can model collapse be prevented with qualitative analysis of these synthetic or generated data to be used in future?

I mean if I take AI from human brain perspective, humans of course do not remember every memory or event but significant milestones or basic IQ becomes a part of the human thinking with experience. Can this be incorporated into AI algorithm ?? :slight_smile:


1 Like

Hi @Deepti_Prasad

Wow, these are good questions :slight_smile:

The link I provided was meant to be simple because the OP mentioned of being newbie. The original paper The Curse Of Recursion: Training On Generated Data Makes Models Forget is much richer in details.

I bet the people will come up with ways to tackle this problem (the “pollution” of the dataset with AI’s content).

:sweat_smile: I suppose you’re joking. But to be on the safe side - yes, never “rely” on the internet, let alone AI generated content (which is trained on the internet in the first place :slight_smile: ). Don’t get me wrong, I’m not saying that these tools are not useful (they are), but “relying” is a bit a strong word.

I’m not sure what you mean, but exploratory analysis would also suffer from AI generated content - the problem to distinguish what is AI and what is not is difficult. AI generated content can and do shift the underlying distributions.

They mention in the paper that it is unclear how content generated by LLMs can be tracked at scale.

OpenAI gave up on trying to distinguish between AI-written and human-written text on July 30, 2023:

As of July 20, 2023, the AI classifier is no longer available due to its low rate of accuracy. We are working to incorporate feedback and are currently researching more effective provenance techniques for text, and have made a commitment to develop and deploy mechanisms that enable users to understand if audio or visual content is AI-generated.

So I don’t think there is a solution in the near future.

I agree with your point - I think we (humans) have an advantage of operating in the real world with lots of different stimuli.

I can imagine being in coma with absence of all senses and only having my own thoughts would be something of “model collapse”.
Kids need interacting with environment (toys, different experiences and also others like mothers to help with learning) this way they can build the model (and I think way more sophisticated than current LLMs) of the world but I guess in absence of further experiences would also make their model collapse.
In that regard, Reinforcement Learning is more like that, but it also has its share of problems.

So yeah, I think there are a lot of smart people that are coming up with different ideas and hopefully will come up with something :slight_smile:


Hello @arvyzukai,

Thank you for replying in detail.

I was going through the original paper and I already have more questions to ask!! I hope I am not annoying :grimacing:

Can the issue of model collapse be solved or improved by introduction dataset of literature archives or published websites as those won’t be generated data but human generated data??

Can the conversion of synthesize/AI generated of data be prevented by adding fine tuning methods of adding parameters with labelled features of human generated data?

Basically in the article you shared, they have mentioned considering what happens when music created by human composers and played by human musicians trains models whose output trains other models which again caused model collapse because of learning from data produced by other models.

So for this what if we make a model with introduction of 8 Elements of Music, in alphabetical order, Dynamics, Form, Harmony, Melody, Rhythm, Texture, Timbre and Tonality, further adding test/dev dataset by introduction of platform like Spotify music library to train the model and create AI-based music model!!! (Dont tell me Spotify again is AI generated data :frowning: then again it won’t work)

I think handling pollutant of dataset would be first by letting every model trained comes with closer accuracy of human-based accuracy with variation. The main problem with AI generated model I felt is that we keep choosing model which give us the best accuracy with least cost function. If one wants to create an artificial intelligence algorithm one needs to understand even human IQ is not a most perfect algorithm of only accuracy but logical thinking and situation based which need to be incorporated into AI by human generated literature or feedback.

reliability part I meant like how now everyone is using ChatGPT to get answers and response for a query :joy: :wink: which I totally don’t agree. I am sorry if someone doesn’t like this response.

I guess creating only LLMs or Image detection or video recognition individually will always have its own pros and cons. Wouldn’t it be great if we create a model with a combination of all just like how human brain has a memory library of learnt data plus image memory plus event based video memory, that would be good robust model :slight_smile:
What say!!!


1 Like

Wow, this thread is so amazing.


Hi, @Deepti_Prasad

No problem at all, I’m happy to interact :slight_smile:

They are already trained on literature archives (at least to my knowledge) but the problem further from this point in time is that it is hard to automate what is genuine human content and what is not. For example, I’m sure there are already LLMs created content that is falsely dated back to 1800s or hallucinations in which the LLM think the content is from that time and manually filtering that out becomes the hard part.

I’m sorry, I don’t understand what you mean by that. Can you elaborate?

The improvement of underlying architectures of LLMs, Loss functions, Optimizers or the way we choose models could be the way out of this problem but at the moment they are the best we got.

I think we got off topic a little bit. So let me get things straight first and then get back to previous point:

Jess’ original question is “How long until LLMs can talk to each other?”. And basically the answer to that is, as Juan Olano mentioned, - they already do.

The point I wanted to raise was that conversations of these LLMs’ (even the best ones) create a problem - polluting their own dataset for their future versions (model collapse).
The heart of the problem is that “Probable events get over-estimated” and “Improbable events get under-estimated”. The image that I think best illustrates what happens:

Later in our discussion, I wanted to further share my thoughts on the matter that the underlying architectures or the way we choose models might also not be the problem - the real problem could be the lack of diversity in the dataset.
We, humans, have a lot of stimuli and diversity every second and, as we know, our understanding of the world is just a small fraction of what happens in it (we don’t see certain spectrum, we don’t feel magnetic waves, neutrons etc. etc and that is just a small part of what we know that we don’t know). So, in some sense we also reduce (or compress/abstract) the “universe”. And, sociologically we’re also susceptible to echo chambers and cliques. So, maybe, the only way out of the problem is letting LLMs interacting with the real world (not just the internet) and that sounds scary to a lot of people :slight_smile:

Just my thoughts.

I’d very much like to see any of these you’ve built.

Hi Jess
I had a similar question. And in fact some ML models are trained to play against even themselves, many times, to master the gaming against humans as done for Go, Dota, or modern Chess reinforcement learning AI models
Moreover, there are Generative Adversarial Networks in Vision when image Generation plays against Recognition (“discrimination”) - improving both.
However, with LLMs - they have some probabilities on dictionary words (or parts of words comprising tokens) – well polished by data. And then, by humans on RLHF… I believe - chatting will confirm the weights, but marginal value will be limited. If so - why not just getting weights, and to continue adjusting from this checkpoint?
New models are, in fact, tested by more advanced for research purposes, but the most advanced - need real world data to get better :wink: Same as, when evaluating model - you need test set instead of training set. Since the data which model already seen - won’t test the quality and incrementaly not likely to improve it (will be just another small training step)