Hello, around 2:47 of the lecture titled above in Week 3, Dr. Ng mentions that if we use linear activation functions on the first hidden layer and then use sigmoid for the second one, “the model is no more expressive than standard logistic regression without hidden layers.” Can I ask you why and how do you prove this? Thank you for your patience.

Best regards,

Juheon

The linear transformation produces a linear output, non-complex, just pass through the layer, it can not fit to complex behaviour of systems, thats why he says the model is the same as logistic regression! Its just a passthrough for the linear part!

1 Like

Thank you so much for your clear explanation!

1 Like

It is an easy to prove mathematical theorem that the composition of linear functions is still linear. So unless you include non-linearity at each layer of the network, there is literally no point in having multiple layers: you can’t learn a more complex function. The point of adding layers with non-linearity is that it greatly increases the complexity of the functions that can be learned. So the networks are more powerful with more layers.

2 Likes