I’m working on a Gen-AI powered application that acts as a mentor to help users practice conversations for specific scenarios.
Without going into too much detail, the app will conduct one-on-one conversations with users in a specific way, tailored to the context chosen by the app.
To get started quickly and iterate in an agile manner, I plan to use a proprietary model (like ChatGPT, Claude, or Gemini) along with RAG and prompt engineering.
The problem I’m facing is that no matter how I write the prompts, the application doesn’t act as expected. It doesn’t behave correctly even when I clearly define its role and instructions.
I’m considering fine-tuning the model and then using the fine-tuned version with RAG. Does this sound like a viable solution to you?
Also, I know that the Claude model can’t be fine-tuned (right?), and OpenAI only allows fine-tuning on the ChatGPT 3.5 Turbo model, not the 4o. Should I fine-tune the ChatGPT 3.5 Turbo model or Google’s Gemini model? Or should I look into fine-tuning an open-sourced model?
I think the problem you are facing has more to do with Conversational Experience (CX) and Information Architecture than with models and prompt engineering.
For what you just referred to in your message, the interactions have a “scripted” part, where the agent shouldn’t deviate from a standard path, and then a generative component where you tailor the Conversational Experience to the user.
Have you clearly defined how and when it needs to stick to the standard path and how and when it needs to tailor the CX to the user’s specific needs?
If I were in your shoes, I would look for a tool or framework like VoiceFlow or BotMaker.
I haven’t tried chatbot tools like VoiceFlow or BotMaker yet. I might need to give them a shot. However, the reason I didn’t consider using those off-the-shelf tools is that the use case for the application I’m developing seems different from the common chatbot scenarios.
Let me explain:
The problem I’m facing isn’t really about defining when the chatbot needs to follow a standard path and when it needs to address specific user needs.
The application (or chatbot) I’m developing needs to pull background information from a local knowledge base and conduct conversations with users in a specific way. (This might sound like a standard AI chatbot use case.)
The difficulty is that the application (or chatbot) isn’t behaving in conversations the way I want it to.
For example, I want the chatbot to:
Start the conversation by explaining the general context, such as a simulation of commercial negotiations between a supplier and a buyer.
Then, the chatbot needs to play the role of the buyer, and the user needs to play the supplier.
Ideally, after completing a “simulated” supplier-buyer discussion, the chatbot would summarize and provide feedback on what the user did well and what could be improved.
The problem is, the chatbot doesn’t always stick to its role as a buyer. If the user suddenly says something like “I don’t know what to do, could you help?”, the chatbot deviates from its role and starts explaining how to behave in such situations.
Since I can’t control what the user will say, I can’t define rules for the chatbot on how to behave in every possible situation. This is also why I haven’t tried tools like VoiceFlow or BotMaker, as they seem to rely on a combination of LLM and predefined rules.
I hope my explanation is clear, and I really appreciate any other suggestions you might have.
I’m glad to offer some ideas to help you address the challenge you’re facing.
Tools like the ones I referred to are great because they allow you to define the information architecture (the flow diagram) of the CX and just use Gen AI to generate the dialog lines (making it really easy to do RAG and API calls to fill the context).
From what you’re telling, it seems you’re letting the model choose their own path. I think you need “to hold the leash” tighter so to speak.
It would be useful to revisit your prompt engineering too.
Thanks for sharing your parameters. The first thing I would try, before even going to VoiceFlow or BotMaker is to reduce the temperature to 0, if you want to have predictable results. I’m sure it will reduce the problem you described.
I would advise using 0 as the temperature even if you port your workflow to VoiceFlow.
It’s working better now with the lower temperature and some fine-tuning. I also tried VoiceFlow, which is indeed a useful tool. However, since there is no defined pattern for the interview conversations my project is handling, and the user interactions with the chatbot are completely random, I find it challenging to set relevant rules or paths in VoiceFlow to cover different scenarios.
Therefore, I plan to complete the first version of my project quickly, let people use it, and then improve it gradually.