There is no rule of thumb to find out how many hidden layers or units you need. You treat them as hyper parameters. To find out the best combination for your validation dataset, you can perform a grid search:
In this case, your network design should have 4 hidden layers, each with 90 neurons for an MSE of 11.5 (which I think is the lowest when I look at the grid). You can do another grid search around that spot to find an even better architecture (or widen your search).