I have two question on the Relu activation function:
Normally, the Relu function is a straight line on the left and rise to the right. But I guess that shape can rotate to any angel by different w and b like a linear function? But can the straight line part on the right side instead of always on the left side like the 4 little relu shapes in the screenshot?
I don’t understand why the 4 Relu lines can connect to one another seamlessly at the two ends?
Can I say the final composite Relu line is taking each component of small Relu one by one like the colored screenshot? If so, why the green part is rising instead of downward like the green one at the bottom?
Yes, the ReLU shape can effectively shift or rotate depending on the weight and bias—it’s still the same function (0 for negative input, linear for positive), but weights and biases move the activation threshold left or right. So the flat (zero) part can appear on the right side instead of the left, depending on those values.
The reason the individual ReLU lines appear to connect seamlessly in the final output is not because they are stitched together one by one, but because they are all applied in parallel and their outputs are summed. Each ReLU unit activates at different input ranges, and together they form a piecewise linear approximation.
So no, the final curve doesn’t follow each colored ReLU one after another—it adds them all up at each point. That’s why the green part of the final curve can be rising, even if the green ReLU unit alone slopes downward—other units are contributing stronger positive values at that point.
Hope it helps! Feel free to ask if you need further assistance.
Thanks Alireza! This is a very clear explanation! Just one more follow up to your reply that “So the flat (zero) part can appear on the right side instead of the left, depending on those values.” Since the 4 small Relu use the same inputs which means their output is added up together instead of different input with different Relu, the final composite will not always have the flat (zero) part right? Because if one of the Relu has flat part to the right, then the final composite can be something like this?
The key point here is that the ReLU activation allows you to make a piece-wise approximation of any complex curve, if you provide enough units and training examples.
The ReLU shape can be flipped vertically by the weight value, and can be shifted horizontally by the bias value.
The secret to ReLU’s ability is that the output of the system is the weighted sum of the contributions from every ReLU unit. So there is no one-to-one relationship between the segments and individual ReLU units. All units contribute to all segments.
Additionally, the output unit has its own weight and bias values. So it can adjust the contribution of each ReLU unit.
I worked out an example of using ReLU units to learn a complex curve, and posted it in the forum about a year ago. I’ll try to find that example and post a link to it here.
Thank you TMosh for the post you found! I’ve read it and have a few questions:
In that post, you gave two shapes of Relu, one when w being negative and one when Relu being positive. Can Relu look like shapes below(the two red shapes on the right)? If so, what w and b will be like?
In the post, you said “Because of the shape of this training set (all y values are positive), all of the bias values are negative.” Why is that?
In the post, you said “All five units have different bias values, which allows each curve to shift vertically.” Shouldn’t it be horizontally, moving the curve left and right?
Yes, the shape is controlled by the sign of the weight and bias values.
Because all of the ‘y’ values are positive. You can work out an example by hand, if you find a discrepancy in my explanation, please post it here.
It depends on how one wants to describe it. Perhaps I could have chosen a different set of words. The sign of the weights and biases is the mathematical key.
I read again on the lab notebook and a bit confused by the graph below. If as you said, the Relu lines apply together on all input instead of each responsible for a input range and then put together, then why in the screenshot it is said unit 1 Relu impact to [0,1] and unit 2 Relu cut off until x=1?
Ah I see I got totally confused by the graph and forgot the function is max(0, wx+b) so the unit 0 will be flat from x=1 and onwards. But why on the screenshot/graph, it says "Relu restricts impact to [0,1]. Why it starts from 0 instead of from negative infinit which is the whole left side of the line?