Week 1: Pretraining Large Language Models

Generative AI With LLMs
Week 1: Pretraining Large Language Models

Significance of scale
The larger a model, the more likely it is to work as needed to without additional in-context learning or further training.
This observed trend of increased model capability with size has driven the development of larger models.

Is there more info or links to papers which proves this? Is it simply that with more data, we are better fitting the previous under-fit models?

Do we have metrics to check how well these LLMs are fitting? Can we determine accuracy, Precision, Recall, F1-Score etc. for the LLMs?

Does it all boils down to how the word/sentence embeddings are computed? Shallow Bag-of-Words models (Latent Semantic Analysis, Latent Dirichlet Allocation, Term Frequency-Inverse Document Frequency) vs. neural network based doc2vec/BERT/GTP models?

There are studies out there just search in google, but yes the underlying point is that with more data you have a better fitting.

Yes there are metrics, those you mentioned also rouge scrore…you will see how they are used in the coming weeks.

Not just that but it is also a crucial part, also the transformers architecture is an evolutionary very crucial step.

1 Like