In Lecture video (why do we need Non linear activation function ), we cannot understand the below statement. can u please what does it meaning ?

But other than that, using a linear activation function in the hidden layer except for some very special circumstances relating to compression that we’re going to talk about using the linear activation function is extremely rare

I am not sure what Andrew is referring to when he mentions compression in passing. The key takeaway from that lecture is to use non-linear activation functions, except for regression problems. In that case, it makes sense to use the identity activation function in the output layer. However, if predicting housing prices, as Andrew mentions, it might make sense to use the ReLU again, if prices never go below 0.

Hi @jonaslalin , You mentioned use non linear activation function , expect for regression problems. But Proff andrew ng telling in the lecture, use non activation function in the hidden layer for regression problem . Is it for any specific reason ?