Hello
I have gone through the research papers of COT, PAL and ReAct and these are the questions I have.
-
So in COT do we fine tune a pretrained LLM with COT prompts ( I mean do we give a question followed by COT and then let the model generate the answer).
-
At inference we are not providing any COT but the model is able to generate good answers for reasoning tasks.
-
can you explain me how exactly we train/finetune using COT/PAL / ReAct.
I mean what are inputs and outputs while we finetune/train the model like inputs are context+ COT prompts and outputs are answers (or) inputs are just context and outputs are COT+ answers.