What are some other ways to create text embeddings?
I am trying with LLM’s but chatgpt comes with added cost, huggingface models are two large and I dont have sufficient memory to run those models.
Is there any other way to create text embeddings for 100000 rows of review texts?
Have you seen this tutorial on creating word embeddings ?
BPE is effective since the number of OOV tokens is usually much lower than word level embeddings. You can always train a subword tokenizer and an embedding layer from scratch with a restriction on vocabulary size. Alternatively, you can warm start the embedding layer with pretrained weights for vocabulary in a custom subword tokenizer.
Thanks for replying.
I will try this method, thanks!
Is it effective for large number of text data though?
Play with vocabulary size and embedding dimension to tackle the memory constraint on your system.