Never change a running model?

Jochen_Keilitz · December 16, 2024, 2:58pm

Hello,
as I learn in this course solving problems requires both coding and prompt engineering skills. And sometimes it is not so easy to create the right prompt and I found out that a prompt in my local notebook on my computer sometimes does something else than in the notebook of the course.
I assume that is because my local notebook works with gpt-35 and the course notebook uses a newer model.
But I am wondering how to deal with that in a business environment. Would it be best practise to not change a model once it runs? Maybe you use a whole set of prompts and it might be very difficult to test and verify whether they are backward-compatible?
Or would you bind a prompt to a model (get_llm_response(prompt, “gpt-4.0-mini”) or get_llm_response(prompt, “*”) with * for the latest model) and use several models in parallel? What are best practices here?

Mubsi · December 16, 2024, 3:27pm

Hi @Jochen_Keilitz,

One of the things that I have realised with prompting and LLMs is that, since this is a fairly new field, models are being updated quite frequently.

So a prompt that was once working as expected with, let’s say, gpt-3.5 a month ago is no longer bringing back the same results. While there could be other reasons for it, one of them can be that since the model got updated, it is behaving differently with the exact same prompt.

In business environment, for which I must mention I don’t have a lot of experience with when it comes to going to production and scaling with LLMs, firstly, you have to keep an eye on monitoring the behaviour of the LLMs from time to time. And yes, sometimes changing the model version will help. Another technique could be have a fixed “system prompt”. If you don’t know what is meant by that, don’t worry, you’ll learn that in the last module. Other times, caching is used.

Hope this helps,
Mubsi

Deepti_Prasad · December 16, 2024, 3:33pm

With every approach there comes pros and cons.

When you say in your company you used gpt-3.5 you had different version of prompt response to the input provided, so not to run a model without a change might provide you a same kind of response you have built your prompt to llm response, but remember llm always tries to improvise it version or it might stick to your prompt response but will the prompt justify if you include a latest version of llm which might include more information, vectors, embeddings

Being said that when it comes to business environment where the prompt is task specific and relies only an agent to agent interaction and where the response might be totally based on how the interface agent reacts, data parallel training does play significance where there are multiple approach used.

Replication of model to train on multiple dataset or prompt response
Input splitting where input is split into small batches with each small batches trained to different gpu.
Parallel forward pass where each GPU computes the forward pass independently on its portion of the input data. Then the model parameters are shared among all the GPUs and each GPU then computes forward pass using its subset of parameters.
Gradient compution: After forward pass, each GPU independently computes the loss i.r.t. it’s subset of parameters.
Then the gradient calculated on each GPU are synchronised across all GPUs by gradient all reduction and averaged before back propagation. Gradient propagation produced from GPU1 is called dl1 and GPU2 is called dl2, then gradient for both gpus will be (dl1+dl2)/2
with synchronised gradient, backward pass is performed. Each GPU updates it’s subset of parameter using the aggregated gradient.
Then updated parameters shared among all the GPUs making sure model parameters are consistent with replicas
When next set of batch is introduced, the whole process is repeated in the training dataset.

If a single GPU has multiple core then each core handles a portion of workload.

Jochen_Keilitz · December 16, 2024, 6:25pm

Hello Muhammad and DP,

thank you very much for your answers. Maybe there is a misunderstanding.

I did not experience different responses when using the same model in the lessons, I experienced differences when for example I run the same prompt in the course environment (using gpt-40-mini) and in my local environment (using gpt-35).
I can imagine that even when using the same model answers might vary depending on how the model works internally and maybe also how the model evolves over time.
But I can imagine (and saw while doing the lessons) that differences are a fact when using different models. For example for some prompt the gpt-40-mini model started to hallucinate while the gpt-35 model did not (and therefore there would not have been a reason to take some measures against it when creating that prompt).
I also understand that models get better and maybe their answers are more accurate (for example in summaries) or that they understand less strict worked out prompts.
But as long as a new model has not been carefully tested using the current prompts it might be necessary to still run them on the old model before you upgrade the model in your business. So new prompts will be run with the new model and current prompts with the previous one for some time.
That is why I had the idea to bind a prompt to the model it has been designed for.
So I am curious how this is addressed in real world business applications.

Deepti_Prasad · December 16, 2024, 6:51pm

the reason behind different response with different gpt version is based on the seed parameter which are essentially random numbers or vectors that are used to initialize various processes, ensuring that the results are not deterministic and exhibit desirable properties

check below two links which explains this

https://platform.openai.com/docs/advanced-usage#reproducible-outputs

Mubsi · December 17, 2024, 6:11am

Hi @Jochen_Keilitz,

I believe Deepti pretty much covered the “why” this happens.

For me personally, and again, as I mentioned, I don’t have experience with LLMs in production, I imagine it is done as you have guessed, “bind a prompt to a model”.

One of the phrases that I liked which was used in one of our courses was, “Taking a truck for a quick grocery run is bit of an overkill”. Which basically means, if things can be done in a simple way, don’t complicate them with using larger LLMs.

One thing that I can tell you is that, as you also mentioned in one of the threads about the models being used in the course, that for simple prompts (tasks), the team used gpt-3.5 model, because it would give you results as expected so using a larger model (or the latest), such as gpt-4o-mini didn’t make sense, especially in terms of cost.

When gpt-3.5 stopped giving the expected output, the team started using gpt-4o-mini.

So I imagine this is the case you faced as well, using gpt-3.5 with prompts that returned better results with gpt-4o-mini.

We saw a similar case in one of our crewAI courses. The course was designed using gpt-4 model, but published using 3.5, so there were a lot of complains about the behaviour of the model (this was because the prompts (tasks) were a bit complicated for the 3.5 model to handle)

So I imagine in business applications, you go to production and deployment with the combination of the prompts and associated models that worked well in development phase and don’t change it.

I hope I made sense.

Best,
Mubsi

Jochen_Keilitz · December 17, 2024, 9:44am

Hello Mubsi and Deepti,

thank you very much for your responses. I think I have a much better understanding and picture now what happens and why.

It totally makes sense what you say and I think it may be best practise to at least have a comment in a prompt for what model it has been designed and tested for (or check it programmatically). In a way a prompt is “literal” code in combination with a specific model.

I am curious how things will evolve.

With best regards
Jochen

Deepti_Prasad · December 17, 2024, 10:48am

From business perspective as far as I understand you can actually create prompt to be specific in the seed parameters for both versions of gpt as well as temperature, so as to ensure a more task (business specific) log probs related to the llm model you are using, you can actually achieve to get same response with any version of the gpt.

Remember the prompts are relative to number of vectors, tokens and temperature one is assigning based on their business model.

You will understand this once you come across transformer model where attention mechanism plays roles from creating encoder-decoder question answer model.

You are curious and humble learner which I can see by the doubts you ask, so probably you will create one day an llm with task specific prompt, sharing your experience.

So keep learning!!!

Regards
DP

Topic		Replies	Views
ChatGPT behaviour now differs from videos Building Systems with the ChatGPT API	3	206	February 11, 2024
Wrong answer in Principle 2: Give the model time to “think” ChatGPT Prompt Engineering for Developers	5	221	November 17, 2023
Which model does the Chatbot in the course uses AI Python for Beginners	4	63	January 9, 2025
LLM environment keeps locking up! Introduction to Generative AI for Software Develop week-module-3 , coursera-platform	5	65	April 8, 2025
Map a Problem to an LLM Model? Generative AI with Large Language Models week-module-1	4	625	July 2, 2023

Never change a running model?

Related topics