|
Why do we consider $$z = \vec{w}.\vec{x} +b$$ when implement classification model using a sigmoid function?
|
|
2
|
25
|
August 25, 2025
|
|
What is the difference between tools and resources in the context of APIs?
|
|
1
|
320
|
May 21, 2025
|
|
Multi-head attention
|
|
3
|
71
|
February 17, 2025
|
|
Week 3: Distillation: Train a big model, then use that the train a small model, seems convoluted?
|
|
3
|
406
|
January 25, 2025
|
|
Sliding Window v/s Convolution
|
|
3
|
273
|
December 10, 2024
|
|
Why Log Sigmoid log(σ(r_j - r_k)) as loss function to train reward model?
|
|
12
|
993
|
September 26, 2024
|
|
What could be the error in training the detection model?
|
|
6
|
42
|
July 22, 2024
|
|
How does attention work
|
|
1
|
283
|
May 1, 2024
|
|
Intuition reagarding why output of "scaled-dot product" attention represents similarity between tokens
|
|
1
|
252
|
May 1, 2024
|
|
Why columns of W1 matrix correpond to the words at the correspoding index in V(vocabulary)
|
|
3
|
174
|
April 30, 2024
|
|
Sum of probabilities of n-length sentences = 1 (1 <= n < inf)
|
|
1
|
200
|
April 27, 2024
|
|
A though/doubt on the ANN : Scaling Face recognition to a million people
|
|
4
|
173
|
April 27, 2024
|
|
One shot/multi prompt engineering intuition
|
|
2
|
405
|
February 20, 2024
|