Feedback on E-commerce Product Similarity Model for Fine-Tuning

Hi everyone,

I’m a beginner working on a project to improve product recommendations for an e-commerce platform. The goal is to calculate product similarity to generate labels for fine-tuning a model that produces vector embeddings. The features I plan to use include:

  1. Textual Descriptions: Using embeddings from pre-trained models (like Sentence-BERT).
  2. Category: Similar categories increase similarity.
  3. Brand: Similar brands also contribute to similarity.
  4. Price: Products with similar prices are considered more alike.

I intend to compute a final similarity score with a weighted combination of these features. This fine-tuning is aimed at achieving more relevant results, which I will use with FAISS (Facebook AI Similarity Search) to efficiently find and retrieve similar products.

Questions:

  1. Is this approach sufficient, or should I incorporate additional features?
  2. What are the best practices for fine-tuning embeddings for this task?
  3. How can I optimize the weights assigned to each similarity factor?

Any insights or suggestions would be greatly appreciated!

Thanks

My thoughts as follows:

  1. Is this approach sufficient, or should I incorporate additional features? - You have to build test and see it if its sufficient or not.
  2. What are the best practices for fine-tuning embeddings for this task? - Check Gen AI with LLMs specialization here, LORA and PEFT methods introduced there.
  3. How can I optimize the weights assigned to each similarity factor?- Maybe you could use Reinforcement Learning here when comparing similarity scores!