Text embeddings

Manish_Kamble · February 10, 2024, 8:55pm

What are some other ways to create text embeddings?
I am trying with LLM’s but chatgpt comes with added cost, huggingface models are two large and I dont have sufficient memory to run those models.

Is there any other way to create text embeddings for 100000 rows of review texts?

balaji.ambresh · February 11, 2024, 4:53am

Have you seen this tutorial on creating word embeddings ?

BPE is effective since the number of OOV tokens is usually much lower than word level embeddings. You can always train a subword tokenizer and an embedding layer from scratch with a restriction on vocabulary size. Alternatively, you can warm start the embedding layer with pretrained weights for vocabulary in a custom subword tokenizer.

Manish_Kamble · February 11, 2024, 12:51pm

@balaji.ambresh
Thanks for replying.

I will try this method, thanks!
Is it effective for large number of text data though?

balaji.ambresh · February 11, 2024, 2:40pm

Play with vocabulary size and embedding dimension to tackle the memory constraint on your system.

kwiseth · March 29, 2024, 11:09pm

BPE: Byte code embedding
OOV: Out of vocabulary

Topic		Replies	Views
[Week 2] - Embedding and Transfer Learning Sequence Models coursera-platform	6	613	May 24, 2021
Creating word embeddings NLP with Classification and Vector Spaces week-module-3	2	321	July 30, 2024
Week2, emojify v_2, Embedding layer Sequence Models coursera-platform	2	526	April 21, 2022
English word embeddings NLP with Classification and Vector Spaces week-module-4	2	342	October 18, 2023
About word embeddings in the CBOW model NLP with Probabilistic Models week-module-4	1	520	December 1, 2022

Text embeddings

Related topics