Is there a way to create unimodal embedding from text with the same embedding space as the cross model embeddings?
For retrieving Image-Text pair from video segments based on text query .
Is there a way to create unimodal embedding from text with the same embedding space as the cross model embeddings?
For retrieving Image-Text pair from video segments based on text query .