How is ChatGPT able to generate a response given prompts

vkbalasub · February 11, 2023, 3:20pm

I am going through the AI for everyone session 1 lecture by Andrew on Coursera (what ML can do and cannot do?). I see that the lecture content states that given a prompt or question, a response cannot be generated. Is this statement still true, given the ChatGPT hype that we are seeing in the market lately? How can I learn more about what transformers are used in conversational assistants?

paulinpaloalto · February 11, 2023, 4:44pm

It’s a good point that other people have noticed recently as well. The state of the art is always moving in ML/DL/AI and this is a good example. Here’s another recent thread about this same question.

If your question is specifically how ChatGPT accomplishes what it does, I have not actually read any articles on that. I’m sure a search will turn up plenty. The topic of Transformers is covered here in DLS Course 5 and also in the NLP Specialization Course 4.

alvaroramajo · February 12, 2023, 10:31am

Hi, @vkbalasub !

As we all know, deep learning research is developing incredibly fast, so certain statements made in the past may become outdated pretty quickly.

We have seen that some models, like ChatGPT can actually generate answers to your question with a really convincing and human-like style (which may not be precise, though). The way it works is based on the original GPT-3 model with some pretty clever Reinforcement Learning from Human Feedback (RLHF) explained in Instruct-GPT. Here’s a quick summary from its abstract:

Making language models bigger does not inherently make them better at following a user’s intent. For example, large language models can generate outputs that are untruthful, toxic, or simply not helpful to the user. In other words, these models are not aligned with their users. In this paper, we show an avenue for aligning language models with user intent on a wide range of tasks by fine-tuning with human feedback. Starting with a set of labeler-written prompts and prompts submitted through the OpenAI API, we collect a dataset of labeler demonstrations of the desired model behavior, which we use to fine-tune GPT-3 using supervised learning. We then collect a dataset of rankings of model outputs, which we use to further fine-tune this supervised model using reinforcement learning from human feedback. We call the resulting models InstructGPT. In human evaluations on our prompt distribution, outputs from the 1.3B parameter InstructGPT model are preferred to outputs from the 175B GPT-3, despite having 100x fewer parameters. Moreover, InstructGPT models show improvements in truthfulness and reductions in toxic output generation while having minimal performance regressions on public NLP datasets. Even though InstructGPT still makes simple mistakes, our results show that fine-tuning with human feedback is a promising direction for aligning language models with human intent.

vkbalasub · February 12, 2023, 10:29pm

Thanks for the guidance and the links. will read through the paper.

Topic		Replies	Views
More Plausible Text, Familiar Failings: What is ChatGPT — Its weaknesses and how it works AI Discussions the-batch , ai-discussions	1	79	May 20, 2023
Generative AI with Large Language Models: Week 3 Generative AI with Large Language Models llm , generative-ai	1	33	November 20, 2024
ChatGPT behaviour now differs from videos Building Systems with the ChatGPT API	3	206	February 11, 2024
Week 2: Intuition check for Step 2.1 in "Perform Full Fine-Tuning" Generative AI with Large Language Models week-module-2	3	427	July 24, 2023
Finetune model with conversations GenAI with LLMs Resources	1	342	August 26, 2023

How is ChatGPT able to generate a response given prompts

Related topics