- What is beta ? we have never used any Hyper parameter beta so far in the course., similarly Beta1, Beta2, etc
- What is epsilon? we have never used any Hyper parameter epsilon so far in the course.
- When we plot two hyperparameters say alpha (learning rate) and epsilon, do we want to see how one Hyper parameter change wrt other ideally we should see how cost change wrt each hyper parameter. What am I missing here
- while discussing alpha, prof said in the lecture that 90% of the resources are used in identifying samples from .001 to 1 and only 10% resources is used in identifying sample from 0.0001 to 0.001. What does this mean ?
Hi @ananyboss
It refers to a hyperparameter that controls the exponential decay rate for the moving average of past gradients or past squared gradients (Momentum and Adam respectively)
This is a small constant added to the denominator in to prevent division by zero.
The goal is to understand how different combinations of these hyperparameters affect the model’s performance.
The learning rate is more varied between 0.001 to 1 because most models perform well in this range. Less focus is given to finer tuning within a much smaller ranges because these are typically less effective ranges for the learning rate, so less computational resources are dedicated there.
Hope it helps! If you need further help, feel free to ask.
Hello,
Thankyou for your reply. I have a follow up question about the part where we are comparing two hyperparameters. You mentioned that we try different combination to see how these affect model performance. how do we measure model performance? I could mention time takes for training, computational resources, accuracy of prediction in test data ?
but can we try all of this combination and then figure hyperparameter?
You can evaluate the model with the options you mentioned but, mostly training and validation losses (+accuracy) are monitored to see their changes over epochs/batches.