Hi everyone,
I’m a beginner working on a project to improve product recommendations for an e-commerce platform. The goal is to calculate product similarity to generate labels for fine-tuning a model that produces vector embeddings. The features I plan to use include:
- Textual Descriptions: Using embeddings from pre-trained models (like Sentence-BERT).
- Category: Similar categories increase similarity.
- Brand: Similar brands also contribute to similarity.
- Price: Products with similar prices are considered more alike.
I intend to compute a final similarity score with a weighted combination of these features. This fine-tuning is aimed at achieving more relevant results, which I will use with FAISS (Facebook AI Similarity Search) to efficiently find and retrieve similar products.
Questions:
- Is this approach sufficient, or should I incorporate additional features?
- What are the best practices for fine-tuning embeddings for this task?
- How can I optimize the weights assigned to each similarity factor?
Any insights or suggestions would be greatly appreciated!
Thanks