Why non-linear activation function

Well_Zhang · February 9, 2023, 3:56am

I understand without non-linear activation function models would not be able to learn complex relationships between inputs and outputs. However, the explanation in the optional lab confused me.

Why are there different segments in the diagram? For neurons in the same player, aren’t they working simultaneously instead of sequentially?
Why unit 0 is fixed?
What do “target” and “match target” mean in the diagram?

I’m sorry if there sounds like tons of I don’t understand…

rmwkwok · February 9, 2023, 4:15am

Hello @Well_Zhang,

In the above, I have two ReLUs, if I add them up (ReLU(x) + ReLU(x-2)), then in the final curve, how many segments will there be?

For when I have three ReLUs, how many segments will there be in the final curve that add all of them up?

Raymond

Well_Zhang · February 9, 2023, 4:54am

In the last diagram there should be three segments right? [0,2), [2, 4), and [4,)

Thank you for the comment and I think I’ve got the point after trying out the interactive exercise in the lab. But a final question is, in real neural network training process, will each unit be responsible for a specific segment? or they just all contribute to the entire model without clear distribution of work?

rmwkwok · February 9, 2023, 5:29am

almost. There are 4 segments: (-inf, 0), [0, 2), [2, 4), [4, inf), but I think you have got the idea.

Yes, the idea is pretty similar, except that we don’t just sum the ReLUs up as equal, instead we weighted sum them up: w_0\text{ReLU}(z_0) + w_1\text{ReLU}(z_1) + w_2\text{ReLU}(z_2) +.... In my last example, 3 ReLUs give 4 segments, and sometimes (in the optional lab), 3ReLUs can just give 3 segments. Instead of definitely saying how many ReLUs will give how many segments, I prefer to just say, more ReLUs can give you more segments, which means that a very complicated curve will require more ReLUs to get a perfect fit.

If you ask for what a ReLU of layer k contributes to the output of the same layer k, then the idea is exactly like what the optional lab tells you. But if you ask what a ReLU of layer k do to the output of another layer k+3, then it becomes difficult to say, or in your words, unclear to say, even though you can always test it out.

Cheers,
Raymond

Topic		Replies	Views
C2_W2_Relu-Activation Lab Advanced Learning Algorithms week-2	9	616	December 8, 2022
C2_W2_Relu Lab - "Why Non-Linear Activations?" Advanced Learning Algorithms week-2	3	276	February 26, 2024
C2_W2_Relu - How the ReLu works? Advanced Learning Algorithms week-2	3	712	July 7, 2022
Relu Optional Lab Course 2 Week 2 Advanced Learning Algorithms week-2	22	620	August 22, 2024
How non linear is ReLU? Neural Networks and Deep Learning	4	778	March 17, 2023

Why non-linear activation function

Related topics