C3W2 Assignment Zombie Detector

In the “Run the training loop” section of the assignment, the loss does not decay as expected.
The output I am getting :
Start fine-tuning!
batch 0 of 100, loss=1.8182377
batch 10 of 100, loss=1.811907
batch 20 of 100, loss=1.8056116
batch 30 of 100, loss=1.7993518
batch 40 of 100, loss=1.7931268
batch 50 of 100, loss=1.7869358
batch 60 of 100, loss=1.7807791
batch 70 of 100, loss=1.7746613
batch 80 of 100, loss=1.7685854
batch 90 of 100, loss=1.7625492
Done fine-tuning!

I went through my notebook several times but I am unable to find out the issue. I have tried tuning the hyperparameters and changing the optimizer as well but the issue still persists.

Please help me find out the issue here.

Did try looking at older threads for similar issue? @hj320

there are more similar threads, when your loss is not reducing meaning it is not learning any new variation, so is the box predictor being assigned or recalled correctly?

Check these two threads!! there are more threads which could explain more, explore those thread, will learn alot going through comments.

Let me know if not able to resolve!

Regards
DP

Hi DP,

Thanks for the response.

I have gone through these threads but didn’t make any difference.
The loss still decays as I have shown earlier.

Not sure where I am going wrong.

can you DM me what is your checkpoint codes?

Hi @hj320

Your checkpoint_path is incorrect
checkpoint_path = “models/research/object_detection/test_data/ckpt-33264”

hints or instruction on how to recall the checkpoint_path, instruction mentions ckpt-0 but you are using ckpt-33264 but clearly your recall requires more correction, check the below image carefully,

a checkpoint would first look for the content then model. Also you missing the point that test_data would have checkpoint file which would contain all the ckpts!!!(I LITERALLY HAVE GIVEN YOU ON HOW TO WRITE THE PATH HERE, IF YOUR PIPELINE_CONFIG IS RECALLED CORRECTLY)

What I also want you to go back and check is
define the path to the .config file for ssd resnet 50 v1 640x640 (to verify you wrote correctly read the instruction below correctly

as the above code would be used to read the model configuration

Let me know if it resolves your issue.

Regards
DP

Hi DP,

Thanks for pointing out the checkpoint path issue.
It turns out I was downloading the checkpoint tar file from the wrong source which gave me the ckpt-33264 files.

The issue got resolved after downloading and using correct checkpoint files.

Thanks a lot for your help!

Regards,
Harsh

1 Like

Great @hj320 so you got better on checking how to debug your codes with hints. Most of the solutions are already they’re in instructions and the output (incorrect or error you get) and other half, using older posts would resolve your issue. The last resort, we(mentors) are always there!!!

Keep Learning!!!
Happy to help.

Regards
DP

1 Like