Prof Ng covers these points in quite a bit of detail in Week 1 of Course 2. I’ll give just a high level summary and then you should definitely proceed through Course 2 and hear the full explanation from Prof Ng.
The idea is that the three datasets are for different purposes:
You always use the training data for the training phase, but the “dev” and “test” sets are used for different purposes. You train with the training set and then use the “dev” set to evaluate whether the hyperparameters you have chosen are good or not. That includes everything from the network architecture (number of layers, number of neurons, activation functions …) to the number of iterations, learning rate, regularization parameters and so forth. That means you do training with the training data and then evaluate the accuracy on the dev set in this phase.
Once you have used the training set and dev set to select what you believe are the best choices for the hyperparameters, then you then finally evaluate the performance of that “final” model on the test data. The point being that you want the final test to use data that was not involved in any aspect of the training up to that point, so that you get a fair picture of the performance on general input data.