Will the units under one layer end up with different weights if the activation function is linear?

The linear activation function has this contour shape and one global minimum value. So if I choose linear function for a hidden layer, then will I get the same weights for each unit in that layer, no matter how the seed value is selected?


if I understood you correctly , each row in W_i (that is the random weights matrix for the I layer, initialized just before the forward propagation ) is just some random numbers and it has nothing to do with the question if we have a special activation on top of the linear combination or not .

Hello @Hong_Shen,

Let’s be careful with the words because they deliver meaning. An activation function does not give you that shape. The squared loss of a linear regression problem gives you that curve.

Check this lecture video out and come up with an answer yourself.


1 Like