M2_UGL_1 - Should we use the same model to critique and to generate refined code?

Joe-Dev · November 11, 2025, 7:04am

Issue/question:

In Module 2 > Ungraded Lab 1 - The suggested flow is: Model 1 (Generate Code) → Execute → Take Code + Generated Graph → Model 2 (Critique code and generate refined code).

Ques. Considering that some models are better at generation (GPT 4o) while others are better at reviewing (reasoning, say Claude) should the critique code and generate refined code steps be separate, such that the generating refined code goes to a model suited for generation instead of one for reasoning?

Or may be there are benefits to asking the same model to identify problems and solve them too, as it might somehow be better at elimination the issues it identified, compared to expecting a generation model to address them?

So which is the better option?

Option 1: Model 1 (Generate Code) → Execute → Take (Code + Generated Graph) → Model 2 (Critique code + generate refined code).

Option 2: Model 1 (Generate Code) → Execute → Take (Code + Generated Graph) → Model 2 (Critique code) → Model 3/1 (Critiques as input and generate refined code).

Explanation/details would be most helpful.

Thanks!

TMosh · November 11, 2025, 3:48pm

It depends on how you define “better”.

Joe-Dev · November 12, 2025, 12:20am

@TMosh Let’s define better to be - final code that incorporates all the feedback from Model 2.

Is there an advantage to have the same model do the review/critique and then generate code (as opposed to leveraging reasoning abilities of one model for critiquing and then using the generation capabilities of a different model since it excels at that)?

TMosh · November 12, 2025, 1:00am

Let’s see what the community Agentic AI gurus have to say.

reinoudbosch · November 14, 2025, 11:41am

Hi Joe-Dev,

It seems to me that the only reasonable answers to your question are ‘it depends’ and ‘just try and see’.

As an example, I once used a (local) model to produce high level pseudo-code and then another model, known for its better coding capabilities, to produce the final code. This did not work very well, so I reverted to using the first model for both high-level pseudo-code and final code. Maybe I did not prompt the second model the right way? Or maybe the first model simply fits my ideas about coding better? WIth every new version of a model things may change.

So I would say that you could just try both options and see which you like better, and, if it is possible to use (objective) evaluation criteria, which produces better results. Apart from that, one advantage of using the same model local or in production may be that the loading and generation processes are less complex.

henrique007 · November 18, 2025, 6:31pm

Looking for help on:

I got some errors trying to run reflection with ‘claude-3-7-sonnet’. Can anyone help on what I need to change in the code to run with this model?
Can I experiment with other models? If so, is there a list of models available?

Joe-Dev · November 19, 2025, 1:13pm

@reinoudbosch thanks for the thoughtful answer. This does sound like one of those things where there isn’t a definite answer and experimentation may be the only way.

I appreciate you taking the time to respond. Thanks!

Wendy · November 19, 2025, 8:23pm

Hi @henrique007 ,

It does look like there’s an issue with using ‘claude-3-7-sonnet’. I got the error:

PermissionDeniedError: Error code: 403 - {'error': {'message': 'using claude-3-7-sonnet is invalid'}}

when I tried it. You can try ‘claude-3-7-sonnet-latest’ instead. It worked for me.
I’ll let the staff know about the issue with ‘claude-3-7-sonnet’ so they can take it off the list of suggested models in the lab.

There is some more discussion of this issue here: Ungraded lab - M2_UGL_1 - model choice invalid

One suggestion - in the future, please create a new post for you questions, rather than adding to an existing one. It will make it easier for mentors to notice your question, and also keeps the threads cleaner and easier for future learners to find posts that might be helpful for them.

Topic		Replies	Views
Error when code generation is changed Agentic AI week-module-2 , ai-discussions , course-topic	7	122	December 10, 2025
Performance metrics for evaluating generated code Evaluating and Debugging Generative AI	2	521	November 10, 2024
Ungraded lab - M2_UGL_1 - model choice invalid Agentic AI week-module-2 , course	11	245	December 26, 2025
AI model for unit test code generation Generative AI with Large Language Models week-module-3	1	339	December 21, 2023
M2_UGL_2 - Are the examples in UGL 1 and UGL 2 equivalent? Agentic AI week-module-2 , course	2	41	November 12, 2025

M2_UGL_1 - Should we use the same model to critique and to generate refined code?

Related topics