Assignment 3: Which loss should be used?

amp1590 · February 7, 2025, 1:54am

Hi,

At first I used Huber loss for this assignment but later it was not accepting saying I was supposed to use “MeanSquaredError” in one of the errors.

Anyway, While using the Huber loss, the learning rate plot was generating valid curves but while using the MeanSquaredError loss, that plot was empty! But I went with that anyway with a learning rate and my assignment passed! I’m very surprised by the fact that why it would work when the learning curve plot generated nothing!

Any idea on this would be highly appreciated. Thank you.

Deepti_Prasad · February 7, 2025, 2:13am

hi @amp1590

the huberloss use section was only provided for learner to understand how learning rate has an effect on the model training speed. the adjust learning rate exercise comes with a note

Notice that this is only changing the learning rate during the training process to give you an idea of what a reasonable learning rate is and should not be confused with selecting the best learning rate, this is known as hyperparameter optimization and it is outside the scope of this course.

However another reason for assignment required you to use MSE instead of Huber loss

The “delta” value in Huber loss determines where the loss function transitions from quadratic to linear, requiring careful selection based on the data, which can be challenging in time series where patterns might be complex and evolving.

huber loss also is sensitive to outlier detection, so it would be difficult to detect the outlier in time series due to seasonality and trends.

Remember time series model comes with seasonality as well as noise, making mse or mae, a better loss choice when comes to detecting any changes with seasonality.

amp1590 · February 8, 2025, 12:11pm

Hi @Deepti_Prasad ,

Thank you so much for explaining the reasons why MSE is better to use here. My follow-up question would be - after applying huber loss at first, I got a plot with a learning rate. But while using the MSE loss in this case, I got no plotting in for the learning rate(attached picture). Is this right? How can I decide on a learning rate when I get no value for learning rate! I went ahead with a learning rate anyway which I got after plotting with Huber loss and that worked but that is just random with reference to this empty plot! Any input on this will be of much help. Thank you.

Deepti_Prasad · February 8, 2025, 3:05pm

Hi @amp1590

I am glad that you noticed and had this query. But perhaps you missed the part that loss must have come as NaN(not a number). So when a loss becomes nan, the learning rate either spikes or becomes a flat line.

For my model training, loss started from 274 at 1st epoch and became Nan at epoch 44 consistent with the explanation that a learning rate graph, this would appear as a sudden jump or a flat line at the point where the NaN loss occurs, as the learning rate can’t be adjusted effectively anymore.

I am sharing a pic of my end output of training with graph where the loss to lr graph showed a distinct minute line at 30 and eventually becoming flat line

jeffp · March 19, 2025, 11:49pm

As a former instructor at a major university I have to say the Adjust Learning Rate section, though stated as optional, sets the student up for failure and needless toil. And there is an element of unfairness to it.

This weeks lesson is RNNs for Time Series and both labs make use of Huber and nothing else. From that it can be reasonably inferred that we are learning a new loss function that is compatible and useful for the overall RNN lesson we are being taught. And from there it is reasonable to infer that we should know it and use it.

Furthermore, as it pertains to techniques to discover a proper learning rate, we can understand that hyper-parameter optimization, in the large sense, is outside the scope of this course but this does not necessarily preclude effectively using the technique that has been demonstrated to us several times now. One might even say it has been drilled into us.

And finally, why provide an optional section in an assignment, which we have already done several times, when there is no benefit to the student? For example, finding an optimal rate, even a weak one, that improves their submission/grade, or extra credit. It seems to be just noise in the assignment.

I respectfully suggest it be officially incorporated into the assignment to better flesh it out – the assignment is otherwise kind-of lite – which seems to be, clumsily, what it is trying to do remaining optional.

–J

Topic		Replies	Views
C4W3_Assignment.ipynb : unittest error – expecting MSE instead of Huber for loss Natural Language Processing in TensorFlow week-3	7	24	December 11, 2024
Learning Rate Tuning techniques Sequences, Time Series and Prediction week-2	2	554	July 3, 2022
Interpretation of this learning rate finder plot AI Discussions ai-discussions	21	253	April 23, 2024
Course 4 Week 2 Assignment Grader output (0/100) failed even though mse: 27.78, mae: 3.29 for forecast Sequences, Time Series and Prediction week-2	4	655	November 11, 2022
Selecting optimal learning rate from Learning rate scheduler Sequences, Time Series and Prediction week-2	1	12	February 12, 2025

Assignment 3: Which loss should be used?

Related topics