In the course 2, week 2 video, Ng shows why a linear activation function doesn’t work. During the simpler example of the second slide, he says, “… what we’ve just shown is that a2 is equal to w x plus b. So w is just a linear function of the input x.” I understand what is written on the slide, but the statement “w is a linear function of x” sounds to me like saying w = mx +b (or, using w again, w = wx + b. But I see nowhere on this slide where wx + b is substituted for w. I see only wx + b being substituted for a1, which is like substituting wx + b for the x in wx + b, giving w(wx + b) + b. That is different than substituting wx+b for the w and getting (wx+b)x + b = wx^2 +bx + b.
So how is w itself a linear function of the input x, when we use only the linear activation function (i.e., no activation funciton)?
Ng also says, “If you’re familiar with linear algebra, this result comes from the fact that a linear function of a linear function is itself a linear function.” This again makes no sense to me. Wouldn’t a linear function of x be a linear function, no matter whether x is a linear function or not. It seems superfluous to specify the case that a linear function of a linear function is a linear function, when a linear function of ___ (i.e., of whatever you care to put in the blank) is a linear function. It’s like replacing a stronger statement with a weaker one.
I’m sure at least some of my lack of understanding is just ignorance on my part. Thanks for any help to understand these particular statements by Ng.