The linear activation function has this contour shape and one global minimum value. So if I choose linear function for a hidden layer, then will I get the same weights for each unit in that layer, no matter how the seed value is selected?

if I understood you correctly , each row in W_i (that is the random weights matrix for the I layer, initialized just before the forward propagation ) is just some random numbers and it has nothing to do with the question if we have a special activation on top of the linear combination or not .

Hello @Hong_Shen,

Let’s be careful with the words because they deliver meaning. An activation function does not give you that shape. The squared loss of a linear regression problem gives you that curve.

Check this lecture video out and come up with an answer yourself.

Cheers,

Raymond

1 Like