Really poor loss - help with debugging

qixuanhou · August 20, 2021, 3:15pm

Hello, all. I passed all the tests from exercise 1 to 9. When I built the training steps for exercise 10, it runs fine but the loss is terribly large. I reviewed my previous exercises and I am having difficulties to figure out the my mistakes. May I ask for some suggestions? where should I look into? what might I do wrong? Any suggestions would help. I appreciate your help. Thanks.

Start fine-tuning!
batch 0 of 100, loss=9732423.0
batch 10 of 100, loss=9670611.0
batch 20 of 100, loss=9608616.0
batch 30 of 100, loss=10025403.0
batch 40 of 100, loss=9175880.0
batch 50 of 100, loss=9878275.0
batch 60 of 100, loss=9364504.0
batch 70 of 100, loss=9010551.0
batch 80 of 100, loss=9743804.0
batch 90 of 100, loss=9183414.0
Done fine-tuning!

AnkitSaini · August 20, 2021, 4:54pm

@qixuanhou
Since you passed all the tests till exercise 9, I can only suggest checking your train_step_fn in exercise 10 based on the information you have provided.

qixuanhou · August 20, 2021, 5:41pm

Thanks for your reply. For exercise 6.1, the expected output has “0x7fefac014710” at the end. My output has the same wording, but the hex is different from “0x7fefac014710”. Do you think that’s an issue?

'_base_tower_layers_for_heads': DictWrapper({'box_encodings': ListWrapper([]), 'class_predictions_with_background': ListWrapper([])}),'_box_prediction_head': <object_detection.predictors.heads.keras_box_head.WeightSharedConvolutionalBoxHead at 0x7fefac014710>, ...

AnkitSaini · August 21, 2021, 7:56am

Getting a different hex should not be an issue.

Kiah_Jones · February 8, 2022, 2:39pm

I had the same issue of a high loss value that barely decreased. Although I’m sure you have solved this issue by now, for anyone reading this forum at a later date, consider reviewing Exercises 6.2 & 6.3: Restore the checkpoint.
Although no error will be given, incorrect code in these cells will affect the output of Exercise 10.

It is also possible – if you annotated the predicting bboxes yourself (option 1, Exercise 2) – that the bboxes are not annotated well.

Topic		Replies	Views
Loss values seem absurd Advanced Computer Vision with TensorFlow week-4	17	440	December 24, 2023
Loss Value - Exercise 5 Week 3 Course 2 Custom and Distributed Training with TF week-3	3	552	January 27, 2023
Exercise 10: Define the training step - huge training loss Advanced Computer Vision with TensorFlow week-2	4	636	July 10, 2023
Issue with training loss Advanced Computer Vision with TensorFlow week-2	1	423	September 16, 2023
C3W2 Assignment Zombie Detector Advanced Computer Vision with TensorFlow week-2	6	122	August 6, 2024

Really poor loss - help with debugging

Related topics