The answers of a question suggested that “mini batch gradient decent” and “getting more training data” could help find parameter values to get an small cost function value.
First, I found this question very tricky. There was another question that indicated that mini batch gradient descent is not necessarily faster than batch gradient descent, and I’ve found OK, computationally talking, both approaches have the same time complexity, without memory limits they take the same time, and with it, only batch might be not applicable. The problem with mini batch is that it can leads to worse J cost value after one mini batch is processed and even after the whole epoch is processed. I think is not a strong option to get an small value for J cost.
Second, if the question stated that “gradient descent in a deep network is taking excessively long to find”. How getting more training data can help to get a small value for J cost? More data for a batch process is more time to process that batch, isn’t?
Yes, some of the quiz questions get into pretty subtle distinctions and are sometimes worded in a pretty complicated way.
Generally speaking, we are supposed to treat answers to quiz questions the same way that we treat the programming assignments, in that we don’t give out the correct answers on a public thread, but we can talk about the issues in general.
It is true that minibatch is not guaranteed to work better than batch in every possible case, but generally it does work better in the sense that you update the parameters after each minibatch. That sword has two edges, of course: you get more frequent updates, but you also get more statistical noise in the updates because you are randomly sampling the minibatches. But minibatch gradient descent is the default way people do things these days, meaning that it definitely is worth a try.
I agree that adding more data is also not likely to speed up convergence.
I have not taken this quiz in a couple of years and will have to go back and try again to see if I can find the question you are asking about. But (again) just speaking generally, my first question would be whether we are doing one of the more sophisticated algorithms like RMS, Momentum or Adam. If convergence is slow, that’s the first thing I would think of to try. If even that doesn’t work, then maybe the problem is that the model we’ve chosen just isn’t up to the task, meaning it just has too much bias. So the next thing to try would be a more powerful and expressive model (more layers, more neurons per layer and maybe different activation functions).
If you want to go into more specifics with a particular quiz question, we’ll need to switch to using DMs to have a private conversation about it. To start a DM thread, you click the other person’s name or avatar and then click “Message”.
1 Like
The other general thing to say is that sometimes it takes some experimentation with the quiz questions to figure out the correct answer. There is no penalty for taking the quiz as many times as you want. The only hassle is that they limit us to 3 submissions per 8 hour period. At least that’s the way it was the last time I tried.
Of course the other wrinkle here is that you’ll notice that you don’t always get the exact same questions in the same order. They have a pool of questions that they select from to keep things interesting. 