My name is Francisco, and I am almost finishing the Machine Learning course specialization.
I have seen the video called ‘Learning the state-value function’ where a Neural Network is created to input state and action as an X vector to output Q(s,a) as the target Y.
In that case, which is the reason why the first 2 hidden layers (before the output layer) contain 64 units instead of another number?
Like designing the architecture of neural network for any problem, the appropriate number of layers and number of units for the layers depend on the problem and it is not known unless by experiment. This means that we need to try. If we begin with a very small NN, say one hidden layer with only 4 units, then, as explained course 2 week 3, we may end up seeing an under-performed model. However, we can progressively increase the size of the NN by adding neurons and layers, and test what size is good enough.
The proposed NN in the slide may or may not be the best option, but it should be just good enough for serving the objective.