Week 3: Distillation: Train a big model, then use that the train a small model, seems convoluted?
|
|
3
|
104
|
January 25, 2025
|
Sliding Window v/s Convolution
|
|
3
|
241
|
December 10, 2024
|
What could be the error in training the detection model?
|
|
6
|
29
|
July 22, 2024
|
How does attention work
|
|
1
|
223
|
May 1, 2024
|
Intuition reagarding why output of "scaled-dot product" attention represents similarity between tokens
|
|
1
|
216
|
May 1, 2024
|
Why columns of W1 matrix correpond to the words at the correspoding index in V(vocabulary)
|
|
3
|
167
|
April 30, 2024
|
Sum of probabilities of n-length sentences = 1 (1 <= n < inf)
|
|
1
|
172
|
April 27, 2024
|
A though/doubt on the ANN : Scaling Face recognition to a million people
|
|
4
|
166
|
April 27, 2024
|