Convolutional Neural Networks
#week1
#relu
Deep Learning Specialization
Hi i have a quastion
First lest assume we have a convnet and each of them has a relu activation .
Well, relu doing well in the first hidden units and now we have all out inputs >= 0 ,but for the next layers when we use relu it doesn’t change anything and they are still >=0 .overall, it make our convnet linear and it just like a logistic regression.
Can you tell me why it works well and where is my mistake?
Hi @mhaydari81
A function is considered linear whenever a function f:A→B if for every x and y in the domain A has the following property:
f(x)+f(y)=f(x+y)
ReLU is not linear (f(−1)+f(1)≠f(0)) and it is used to introduce non-linearity in neural networks. While it may seem like ReLU makes the network linear, the depth of the network and the training process enable it to learn non-linear representations effectively.
To show that the composition of ReLU activations introduces non-linearity, consider two inputs x_1 and x_2, and their corresponding output y_1 and y_2 after passing through the convolutional layer and ReLU activation:
y_1 = ReLU(Wx_1 + b)
y_2 = ReLU(Wx_2 + b)
Now, let’s consider a linear combination of these outputs
\begin{align*} ay_1 + by_2 &= a \text{ReLU}(Wx_1 + b) + b \text{ReLU}(Wx_2 + b) \\ &= a \max(0, Wx_1 + b) + b \max(0, Wx_2 + b) \end{align*}
The max function is non-linear, so the expression ay_1 + by_2 is non-linear with respect to x_1 and x_2, which proves that the composition of ReLU activations introduces non-linearity into the CNN.
Link for more information:
Hope the explanation above helps, feel free to ask if you have any questions.
Thank you for your detailed response.
Well let’s assume that w and b and x is already more than zero therefore it doesn’t have any effect in our equation. And it makes our equation linear!
So i have a question!
In every equation is there a negative variable(between w and b and x) to make it non linear and effect as a non liniearity in out equation?
Hi @mhaydari81 ,
Relu is a piecewise linear function, positive input will be output as is, and negative value is output to zero.
During the training phase, the weights (W and b) are updated and can have negative values. I think that you only consider the positive part of ReLU, and that confuses you. If you treat it as @Kic mentioned, your problems will be solved!
There are other types of ReLUs that can be helpful if you take a look at them:
- ReLU
- Leaky ReLU
- Parametric ReLU
- Exponential Linear Unit
- Scaled Exponential Linear Unit
Thank you very much for your assistance.
You’re welcome! happy to help
Good point.
This is the key concept that isn’t initially obvious.
Typically ReLU is presented as a function of some value ‘z’, but they don’t emphasize that z = w*x + b. There are weights and biases that must be learned within a unit that uses ReLU activation.