Hello,
I’m in this week assignment, and I got everything running up to the training loop, but the loss is not converging. I don’t know where the error is. Please help.
Hello,
I’m in this week assignment, and I got everything running up to the training loop, but the loss is not converging. I don’t know where the error is. Please help.
Hi again Deepti,
I keep having the same problem. The loss go to really high numbers after the first iteration. I ran it a few more times, and it eventually starts to go down.
So, I also think I still have a problem with the Checkpoint, but I think I’m following the instructions correctly, so I’m probably confused with them. Last week the training was running quite slow, so I figure there had to be another mistake. I think I was not making the checkpoints right. I made some corrections, and now the code is running fast. But runs with an error message:
Sorry to bother you again, but I’m stuck.
@Maur_cd share the image of error not codes
I will have to check how you did corrections with your checkpoint
You can send screenshot of the checkpoint correction by Personal DM
hi @Maur_cd
I lost your notification inbox between other msgs.
models/research/object_detection/configs/tf2
. The folder has multiple .config files.But you have used content first then models in that code line.
notice there is no content at the beginning. first models followed by whatever you have mentioned is correct.
your download checkpoint code is correct.
Regards
DP
Hi, thank you for your reply.
Fortunately, I was able to find the problem. When I was training the model, the first loss would be around 1, and the rest would go to really, high, so I though about, and of course, I had the wrong value for the learning rate. (0.1 instead of 0.01)
I though the problem was elsewhere, since I was getting a handling error when I was running the trainer. I got the field address using the colab option to get the path from the file. I haven’t figure out why I’m getting a handling error. The following week I also got it, but again, the model work well, but it took around two to three hours to train around 10 epochs. Is that normal?
Anyway, thank you!
yes it will take longer training time for course 3 and course 4 assignments
you can share a screenshot of the error
This is the message I’m getting while training.
okay in the cell you are running, the 3rd code line is throwing error.
That cell was given, so if I go to the third line, It has to do with the length of some list, so I don’t know what would be the problem. While I also get the error, where colab is showing me it is running the cell I did write. I’ll have to try go deeper to see whats the problem. But as I told you, it doesn’t seem to be to big of a deal, as I was able to past the assignment. I’ll try to figure out later on.
maur_cd can you send me a complete screenshot with error by dm, so I can see which code is throwing that error.
it is pointing towards call function, it is probably pointing different type of using the call function for code line 3.