First, can you please clarify why is the loss forced to use only MSE and not anything else?
Second, most importantly, none of the model architectures and hyper parameters combination when used with MSE loss function are able to satisfy the plot bounds [1e-6, 1, 0, 30] as given in:
mse most common used in regression tasks where we create algorithm to detect significance of continuous variables relation between independent and dependent variables.
Cross-entropy loss is often more interpretable in classification tasks whereas MSE may not always have a straightforward interpretation in regression tasks. MSE is sensitive to the scaling of the target values requiring data preprocessing, whereas cross-entropy is invariant to scaling.
Huber loss is used when there is outliers in the data. Basically combines mse and mae to provide better loss function and be less sensitive to outliers.
Did you have used ‘mse’ in the create model grade cell?
Huber loss was for the optional exercise for learners to understand learning rate significance i.r.t. to loss function used, but in the create model you are suppose to use ‘mse’
Due to unittest requirement, the MSE needed to be used in adjust_learning_rate() also to obtain the loss vs learning_rate plot.
MSE based adjust_learning_rate() produces the following or similar outcome at the best, where the loss very rarely falls within the (0, 30) range in Y axis:
The (0, 30) range is the primary reason I think this assignment is intended for Huber loss function and not MSE function.
If someone can confirm getting a better loss curve based on MSE function to fall within the (0, 30) range, I’d appreciate if you can share those hyper parameters with me for testing. (Note: I have submitted and passed the assignment with 100%, so the information will only be for edification.)
Or if there is a viewpoint that loss functions in adjust_learning_rate() and create_model() can be different please provide reasoning on why you think learning rate is not related to loss function.
in adjusting learning rate, you are suppose to use huber loss and create model, you are suppose to use mse based on the hyperparameter and data being trained. they both are different exercise.
The exercise clearly states “Based on this plot, which learning rate would you choose? You will get to use it on the next exercise”. Yet this concept of dynamic adjustment of learning rate seems to have been completely missed.
Note that it did mention using learning rate in the next exercise but it’s not mandatory but was exercised from the perspective of learning can experiment with
So based instructions Adam with learning seems to have had advised
Learning rate 0.09 means almost 0.1, rather using 0.001 or 0.0001
but learning rate hyperparameter is not a mandatory use in create model
We will never resolve this issue because responses are not on the same side of the coin the question is
Typical responses have been merely to satisfy the unittest irrespective of if that unittest requirement makes sense or not.
Asking to treat the two models independent of each other just to meet the unittest requirement seems antithetical to the topics and lectures in this course.
No clarification has been provided to address the following queries:
learning rate is ofcourse related to loss function depending on the data the model is created.
I am not stating learning rate cannot be used on create model, but in the assignment we are working on or you are talking about, it didnt require to use learning rate, more specifics not a mandatory hyperparameter.
but you can experiment if you want to use learning rate and see the results how the adjusting learning rate has an effect of using different loss function, that was the whole idea behind giving the optional exercise for the learners to explore