I learned image 1 from Week 1 of Course 5 , Sequence Model, of Deep Learning Specialization and image 2 from Week1 of Course 3 , Sequence Model, of NLP Specialization.
(image 1)
(image2)
The cost function looks different in above 2 courses. Which one is right? Thanks for help.
There is some flexibility in how you define the loss in cases like this. There are a lot of different possible RNN architectures. I think the differences are pretty straightforward here. In the DLS C5 case, it looks like the classification is binary, so you have the binary cross entropy loss and they take the sum across the timesteps. In the NLP case, it is a multiclass case, so you have the softmax version of cross entropy loss and they also choose to take the average over the time steps instead of the sum over the time steps. As long as you are consistent in how you do that in a given case, you can choose either method.
Thank you for the explanation.