Training not converging to zero on the Zombie Detector

Maur_cd · November 22, 2024, 4:07pm

Hello,

I’m in this week assignment, and I got everything running up to the training loop, but the loss is not converging. I don’t know where the error is. Please help.

Deepti_Prasad · November 22, 2024, 5:29pm

hi @Maur_cd

Kindly refer the below threads

checkpoint issue

defining path

Let me know if the issue still persist!

Maur_cd · November 29, 2024, 12:37am

Hi again Deepti,

I keep having the same problem. The loss go to really high numbers after the first iteration. I ran it a few more times, and it eventually starts to go down.

So, I also think I still have a problem with the Checkpoint, but I think I’m following the instructions correctly, so I’m probably confused with them. Last week the training was running quite slow, so I figure there had to be another mistake. I think I was not making the checkpoints right. I made some corrections, and now the code is running fast. But runs with an error message:

Sorry to bother you again, but I’m stuck.

Deepti_Prasad · November 29, 2024, 12:29pm

@Maur_cd share the image of error not codes

I will have to check how you did corrections with your checkpoint

You can send screenshot of the checkpoint correction by Personal DM

Deepti_Prasad · December 4, 2024, 4:30pm

hi @Maur_cd

I lost your notification inbox between other msgs.

So your code for pipeline config needs correction.
instruction mentions
Navigate to
models/research/object_detection/configs/tf2. The folder has multiple .config files.

But you have used content first then models in that code line.

next in restore check point. instruction mentions
Please set checkpoint_path to the path to the full path models/…/ckpt-0

notice there is no content at the beginning. first models followed by whatever you have mentioned is correct.

your download checkpoint code is correct.

Regards
DP

Maur_cd · December 5, 2024, 12:27am

Hi, thank you for your reply.

Fortunately, I was able to find the problem. When I was training the model, the first loss would be around 1, and the rest would go to really, high, so I though about, and of course, I had the wrong value for the learning rate. (0.1 instead of 0.01)

I though the problem was elsewhere, since I was getting a handling error when I was running the trainer. I got the field address using the colab option to get the path from the file. I haven’t figure out why I’m getting a handling error. The following week I also got it, but again, the model work well, but it took around two to three hours to train around 10 epochs. Is that normal?

Anyway, thank you!

Deepti_Prasad · December 5, 2024, 12:46am

yes it will take longer training time for course 3 and course 4 assignments

you can share a screenshot of the error

Maur_cd · December 5, 2024, 12:49am

This is the message I’m getting while training.

Deepti_Prasad · December 5, 2024, 12:52am

okay in the cell you are running, the 3rd code line is throwing error.

Maur_cd · December 5, 2024, 1:16am

That cell was given, so if I go to the third line, It has to do with the length of some list, so I don’t know what would be the problem. While I also get the error, where colab is showing me it is running the cell I did write. I’ll have to try go deeper to see whats the problem. But as I told you, it doesn’t seem to be to big of a deal, as I was able to past the assignment. I’ll try to figure out later on.

Deepti_Prasad · December 5, 2024, 2:55am

maur_cd can you send me a complete screenshot with error by dm, so I can see which code is throwing that error.

it is pointing towards call function, it is probably pointing different type of using the call function for code line 3.

Topic		Replies	Views
Zombie-detection 0.0 result Advanced Computer Vision with TensorFlow week-2	2	394	November 23, 2023
Zombie detector : No error but the loss still doesn't want to decrease below 0.5 Advanced Computer Vision with TensorFlow week-2	2	579	June 7, 2022
Programming Assignment: Zombie Detector Training not working Advanced Computer Vision with TensorFlow week-2	4	367	January 10, 2024
C3W2 Zombie Detector Loss Output Not as Expected Advanced Computer Vision with TensorFlow week-2	5	49	December 8, 2024
C3W2 Zombie: Not Getting Good Results Advanced Computer Vision with TensorFlow week-2	2	44	September 6, 2024

Training not converging to zero on the Zombie Detector

Related topics