Course 2 - ML Specialization - Why do we need Activation Functions?

shsuratwala · November 3, 2023, 8:43pm

Hi All,

In Week 2 of Advanced Learning Algorithm I didnt understand the basis of Mr. Ng’s lecture “Why do we need Activation Functions?”. Especially I don’t understand the math on the first slide when a = wx + b. Can anyone please explain this to me. Thank you.

rmwkwok · November 4, 2023, 12:00am

Hi @shsuratwala,

A simple example here. You know, price discount is a linear transformation, and a 20% off discount for a $100 product is just a transformation to 100 * 0.8 = 80.

Now, if I give three discounts to the product - 20% off for winter sale, extra 25% off for limited time offer, and another extra 10% off for membership offer, then the transformation will be 100 * 0.8 * 0.75 * 0.9 = 54.

In above, there are three linear transformations, but in our hearts, we know we can replace them with just one linear transformation - which is a total of 46% off discount. With that, in next time when another customer buys a $270 product, we can do 270 * 0.54 = 145.8 right away.

So, the idea here is, a linear transformation of another linear transformation of another linear transformation ( * 0.8 * 0.75 * 0.9 ) can be combined to one linear transformation (* 0.54).

The slide below is just showing the same idea - two linear transformations can be combined as one linear transformation. It will be clearer if you just plug in some numbers into the w’s and the b’s, which is easy because they are all just scalars.

Above is just some maths, but the key idea here is, this tells us that, no matter how many layers we build into our neural network, as long as only linear transformation exists there, it is no smarter than a network of just one layer. This is to say, we will be wasting our time training so many layers for the effect of just one layer.

This second slide of the video elaborates the idea of the first slide in simple maths, try to fully understand the second slide first, then go back to the first and see what you can/can’t make sense of. After that, if you still have questions, try to ask a more specific one, such as, which sentence did you not get?

Raymond

shsuratwala · November 4, 2023, 6:16am

Thank you so much for your explanation Raymond. I just don’t understand one line of the equation when Mr. Ng equates w1 + b1 + b1 = b. Can you please explain that. Thank you

rmwkwok · November 4, 2023, 6:34am

The above two equations, respectively, group terms that are associated with x and terms that are not associated with x.

w’s and b’s are no difference - they are just trainable weights. The two groupings helps us recognize the following:

\text{some trainable weights} \times x + \text{some trainable weights}

and tuning those two groups of trainable weights is no different from training two trainable weights: wx + b

Cheers,
Raymond

VICTORIA_JOSE · December 9, 2023, 5:31am

Hi @rmwkwok
Similarly, if we were to use only sigmoid function on every unit of the neural network, how would the neural network be different from a logistic regression of one layer? Doesn’t the composition of sigmoid functions also give a sigmoid function? Could you please clear my doubt. Thank you.

rmwkwok · December 10, 2023, 2:47am

Hi @VICTORIA_JOSE,

The lecture has provided us a way to prove it -

You see, after the maths, the outcome is just

one linear equation of
the input variable x.

I split the sentence and put the two keys as bullet points.

Now, you can prove it yourself, whether or not, after replacing all linear activations with sigmoid activations, the outcome is also

one sigmoid-activated linear equation of
the input variable x.

If you can’t, then you can’t. If you can, then please show me the steps if you would like to continue this discussion with me

If you can’t, then you have answered why two layers of sigmoid activated layers are not the same as just one layer.

Cheers,
Raymond

VICTORIA_JOSE · December 10, 2023, 3:34am

Thank you so much for the explanation.

Topic		Replies	Views
Activation Functions, (conceptually) Neural Networks and Deep Learning coursera-platform	10	603	November 2, 2022
Week 3, "Why do you need Non-Linear Activation Functions?" Neural Networks and Deep Learning week-3 , coursera-platform	3	219	March 21, 2024
Why do we need activation functions? Advanced Learning Algorithms week-2	1	472	April 24, 2023
Key concepts quiz in week 4 Neural Networks and Deep Learning week-4 , coursera-platform	3	65	July 7, 2024
Week 2 - Why do we need activation functions? A dilemma with respect to polynomial features Advanced Learning Algorithms week-2	10	372	August 20, 2024

Course 2 - ML Specialization - Why do we need Activation Functions?

Related topics