LLM to Classify User Utterance to Intent and Evaluate

Enhancing User Utterance Classification with LLMs

I am currently working on a project that aims to leverage large language models (LLMs) to classify user utterances into corresponding intents. The dataset I am working with consists of user utterances related to car insurance. The main objective is to group and classify these utterances into specific clusters and predict the intent behind each cluster.

For example, in the context of car insurance, user intents might include:

  • Adding an additional driver
  • Requesting an evaluation
  • Updating an evaluation
  • Filing a claim
  • Inquiring about policy details

Due to client data privacy concerns, I am unable to use external LLM APIs. Therefore, I am running everything locally using Ollama.

Questions to the Community:

  1. Project Execution After Preprocessing:
  • What is the best approach for clustering after preprocessing the data?
  • Which clustering mechanisms and embedding techniques would you recommend?
  • Which LLMs would be best suited for this task?
  1. Performance Evaluation of the Classification Model:
  • How can I effectively evaluate the performance of the classification model?
  1. Evaluating Intent Prediction from LLM:
  • What methods or metrics can I use to assess the accuracy of intent prediction by the LLM?
  1. Identifying the Number of Clusters:
  • Since I do not know the number of clusters beforehand, what is the best mechanism to identify the optimal number of clusters?
  1. Local Machine and Cloud Options:
  • I am currently using Anaconda Jupyter Lab on a local machine with 16GB RAM and a default Windows GPU. Is this setup sufficient, or should I consider using other machines or cloud options like SageMaker or Vertex AI?