Loss functions

Hafsa_Farooq · September 23, 2023, 7:34pm

How do we calculate loss in transformer architecture like for BERT, T5, etc.,. since the labels are categorical?

TMosh · September 23, 2023, 10:52pm

Perhaps by using a categorical cross-entropy cost function.

arvyzukai · September 25, 2023, 5:29am

As Tom correctly answered, language models train by minimizing cross-entropy loss (it doesn’t matter transformers or not). So yes, both T5 and BERT and other transformer architectures for language modeling minimize cross-entropy loss and the reason is as you mentioned - categorical labels (model outputs probabilities for categories).

Cheers

Topic		Replies	Views
How the loss of the Causal Language Model is calculated? Generative AI with Large Language Models week-1	1	303	May 30, 2024
Loss function of RNN NLP with Sequence Models week-1	2	59	July 13, 2024
How is this Training loss computed? Generative AI with Large Language Models week-2	1	93	June 11, 2024
Section 3.2 in Week3 assignment is just not explained properly. Please help fix this error Improving Deep Neural Networks: Hyperparameter tun week-3	4	328	March 18, 2024
Week 1 questions Sequence Models	1	525	December 26, 2021

Loss functions

Related topics