I am a Japanese student and my English is not very good.

My teacher said that if you don’t use an activation function in a neural network, you are just calculating a linear activation function. I think this is also true when a linear function is used for the activation function.

Also, he said that Relu function is often used as activation function, and z is approximately z>0. If z>0, activation function becomes linear function in most cases, and I thought that the meaning of stacking layers would be lost.

Hello @kaito.0301.1102, the key is not to just focus on the z > 0 range nor just the z < 0 range. We need to understand ReLU as a whole - that it has slope=0 region when z < 0 and a slope=1 region when z > 0. The actual situation is, as you train your NN, z can go in and out the two ranges, and since you have more than one neuron in your multi-layer NNs, not all z’s will always be in and only in just one of the two ranges. Therefore, it won’t lose the meaning of stacking layers up because the slope difference gives us that non-linearity.