[ELI5] What is embedding?

Jake_Shi · November 22, 2023, 6:54pm

Could someone explain what embedding is and why it is needed in an intuitive way?

TMosh · November 23, 2023, 1:13am

As a general concept:

Embedding tells you the relationships between sequences of words.
Given an initial word, this lets you make predictions about what words may follow.

Jake_Shi · November 28, 2023, 11:23pm

Thanks! How is embedding related to tokenization?

TMosh · November 29, 2023, 12:00am

In essence (not technically complete):

Sentences are composed of tokens. Tokens are the standardized blocks that make up a sentence. A token might be the root form of a word, and it might also include punctuation.
Embeddings give you the relationships between tokens within a specific language.

Jake_Shi · November 29, 2023, 12:19am

Thanks! Very helpful!

Tomas_Vaitulevicius · December 5, 2023, 11:42pm

To continue the conversation about tokens and embedding - what happens if we have a special slang or industry term in our data that the LLM hasn’t been trained on?

would we have to do full fine tuning to teach the model how the new word relates to all other words? Or somehow piggyback off the knowledge the LLM has of the other synonyms?
and how can we do the tokenisation of the new word in the first place?

Topic		Replies	Views
[Week 2] - Embedding and Transfer Learning Sequence Models	6	613	May 24, 2021
A general question about LLM tokenization Generative AI with Large Language Models week-2	7	332	December 14, 2023
The uses of Tokenizer Generative AI with Large Language Models week-1	1	378	October 2, 2023
So, what is word embeddings? NLP with Probabilistic Models week-4	26	833	August 23, 2023
A large language model at character level AI Discussions	0	113	April 29, 2024

[ELI5] What is embedding?

Related topics