Scaling after layer with relu/linear activation

tenzink · February 23, 2023, 11:37am

AFAIK, data normalization is highly recommended for the input features. The sigmoid layer produces values in the range 0 to 1, which is fine for the next layer. But what about linear & relu activation - it can be arbitrarily big, so the layer with this activation might generate outputs of different scales which is not ideal for the next layer. Do we have such a problem in real-world tasks? How it can be addressed?

Kic · February 23, 2023, 4:23pm

Hi @tenzink ,

Here is my two pennies worth of thought:

If the input data has been normalized, then what gets fed into the next layer would be the normalized data.
Relu is an activation function made up of two parts, any positive values will pass through the linear part; and any negative values will pass through the non-linear part - squashing negative value to zero.
Activation function for hidden layer is used for a purpose. So we need to consider why an activation function is chosen. Sigmoid is often used in the output layer for making a binary decision. Due to Relu’s simplicity and efficient to compute, it has become the popular choice for hidden layer.

TMosh · February 23, 2023, 7:43pm

Normalization of the input features is a standard practice, regardless of what method you’re using (linear regression, classification, NN’s or whatever).

The reason is that normalized features allow you to use a larger learning rate and fewer iterations, so gradient descent runs more efficiently.

Topic		Replies	Views
ReLU activation function. (How is it activating things ?) Neural Networks and Deep Learning coursera-platform	2	549	October 12, 2021
Activation functions in the hidden layers Advanced Learning Algorithms week-module-2	4	524	July 21, 2022
Batch Normalization vs Feature Input Normalization Improving Deep Neural Networks: Hyperparameter tun coursera-platform	3	679	May 24, 2021
Linear Activation Function Hidden Layer Neural Networks and Deep Learning coursera-platform	3	596	May 25, 2021
About activation functions Neural Networks and Deep Learning coursera-platform	2	678	August 9, 2022

Scaling after layer with relu/linear activation

Related topics