I have a doubt about the activation function. Is the activation function the same for all the units of the same hidden layer or it can be different?
Suppose you have one hidden layer with two neurons (unit). Must these two neurons have the same activation function or the activation function is unit-based?
Usually, it is the same for all neurons in a hidden layer. For the output layer, it might make sense to have different activation functions for different neurons. Maybe your network outputs the number of people in an image (real number output, relu activation), and if the weather is sunny or cloudy (sigmoid activation) in the image. So you output two neurons, with different activation functions.