Generative AI With LLMs
Week 1: Pretraining Large Language Models
Significance of scale
The larger a model, the more likely it is to work as needed to without additional in-context learning or further training.
This observed trend of increased model capability with size has driven the development of larger models.
Is there more info or links to papers which proves this? Is it simply that with more data, we are better fitting the previous under-fit models?
Do we have metrics to check how well these LLMs are fitting? Can we determine accuracy, Precision, Recall, F1-Score etc. for the LLMs?
Does it all boils down to how the word/sentence embeddings are computed? Shallow Bag-of-Words models (Latent Semantic Analysis, Latent Dirichlet Allocation, Term Frequency-Inverse Document Frequency) vs. neural network based doc2vec/BERT/GTP models?