In Module 2 > Ungraded Lab 1 - The suggested flow is: Model 1 (Generate Code) → Execute → Take Code + Generated Graph → Model 2 (Critique code and generate refined code).
Ques. Considering that some models are better at generation (GPT 4o) while others are better at reviewing (reasoning, say Claude) should the critique code and generate refined code steps be separate, such that the generating refined code goes to a model suited for generation instead of one for reasoning?
Or may be there are benefits to asking the same model to identify problems and solve them too, as it might somehow be better at elimination the issues it identified, compared to expecting a generation model to address them?
So which is the better option?
Option 1: Model 1 (Generate Code) → Execute → Take (Code + Generated Graph) → Model 2 (Critique code + generate refined code).
Option 2: Model 1 (Generate Code) → Execute → Take (Code + Generated Graph) → Model 2 (Critique code) → Model 3/1 (Critiques as input and generate refined code).
@TMosh Let’s define better to be - final code that incorporates all the feedback from Model 2.
Is there an advantage to have the same model do the review/critique and then generate code (as opposed to leveraging reasoning abilities of one model for critiquing and then using the generation capabilities of a different model since it excels at that)?
It seems to me that the only reasonable answers to your question are ‘it depends’ and ‘just try and see’.
As an example, I once used a (local) model to produce high level pseudo-code and then another model, known for its better coding capabilities, to produce the final code. This did not work very well, so I reverted to using the first model for both high-level pseudo-code and final code. Maybe I did not prompt the second model the right way? Or maybe the first model simply fits my ideas about coding better? WIth every new version of a model things may change.
So I would say that you could just try both options and see which you like better, and, if it is possible to use (objective) evaluation criteria, which produces better results. Apart from that, one advantage of using the same model local or in production may be that the loading and generation processes are less complex.
@reinoudbosch thanks for the thoughtful answer. This does sound like one of those things where there isn’t a definite answer and experimentation may be the only way.
I appreciate you taking the time to respond. Thanks!
when I tried it. You can try ‘claude-3-7-sonnet-latest’ instead. It worked for me.
I’ll let the staff know about the issue with ‘claude-3-7-sonnet’ so they can take it off the list of suggested models in the lab.
One suggestion - in the future, please create a new post for you questions, rather than adding to an existing one. It will make it easier for mentors to notice your question, and also keeps the threads cleaner and easier for future learners to find posts that might be helpful for them.