Model Optimization: Distillation "Hard" Values Unclear

gregl83 · February 12, 2024, 8:00pm

“Distillation refers to the student model outputs as the hard predictions and hard labels.”

From the video (~4:30): “model-optimizations-for-deployment”

Description of distillation process is confusing to me. The video describes the output of the Teacher as “soft labels.” From what I understand, the Student is then trained to minimize loss between the “soft labels” and the “soft predictions.”

I’m not sure where the “hard labels” that are “output” from the Student model come from. My belief is the “hard labels” are actually the “soft labels” from the Teacher after done being used to train the Student and the “hard predictions” are the Student’s predictions when training is complete.

Can anyone confirm this?

Topic		Replies	Views
WEEK 3: What's difference between hard predictions and hard labels? Generative AI with Large Language Models week-3	1	350	October 8, 2023
Predictions type during distillation optimization Generative AI with Large Language Models week-3	5	305	November 5, 2023
Week 3: Distillation: Train a big model, then use that the train a small model, seems convoluted? Generative AI with Large Language Models course-related , conceptual-question , distillation	3	219	January 25, 2025
Questions About Distillation Generative AI with Large Language Models week-3	2	344	October 9, 2023
[Week 4] Transformer Network Application: Named-Entity Recognition Sequence Models	11	794	July 21, 2021

Model Optimization: Distillation "Hard" Values Unclear

Related topics