In handwritten digit recognition with forward propagation, the model has 3 layers.
layer 1 with 25 units/neurons, layer2 with 15 units and layer3 with 1 unit.

a) How is the number of layers decided for a given classification problem ?
b) For each layer, how is the number of units chosen?
Please help understand this.

These numbers are called hyperparameters alongside the learning rate or the \alpha. the definition of hyperparameters is that they don’t have a formal way of deciding the optimal values for them to use in a model whether classification or regression.

You will always end up using trials where you will try a collection of parameters usually known as grid_params which usually has the following shape

and use of whatever package you just try different models with different values of these params this process is called GridSearch*. see sickit-learn GridSearchCV or Keras Tuner.

The number of nodes, and even the hidden layers, are determined by the architect of the NN: by you, by me, by the person or group that is creating the NN.

There are several options:

You can get started with a known model which has been created by some researchers, and this model proposes already a certain amount of layers, and nodes inside of each layer.

You can start from scratch.
2.1 Assuming you are a very experienced ML designer, may be you’ll get started with a configuration that, from experience, it is the best layout for the task at hand.
2.2 Assuming you are a novice, you may start with your best guess. For instance, 2 hidden layers, each with 4 nodes.

In any case, you’ll design your NN, train it, and watch the results. If the results are not meeting your objectives, then you start ‘fine-tuning’ your architecture (although fine-tuning the data may be more important, but that’s another topic). At the end you will hopefully reach a design (number of layers, number of units inside each layer, etc) that meets your objective.