why is it that only the relu function action requires a loop to make it an L model but sigmoid function requires no looping ?
First we build an model with many layers, and we want to make our function generalize over many layers, so we decide that all activation functions of all the hidden layers is relu so we make an for loop ovel hidden layers, and the output layer we decide the activation function of it is sigmoid because we built binary classification model like the image below
i see, so this is for a specific L sized model with the output being a sigmoid right?
Yes, the network architecture we are implementing here can have any number of hidden layers and you can choose the sizes of all those layers, but all the hidden layers use ReLU as their activation function. The network is performing a binary classification, so there is one neuron in the output layer and the activation function needs to be sigmoid to convert that into the probability that the answer is “yes”.