Using different activation function for hidden layers

paulinpaloalto · February 6, 2022, 5:54pm

Interesting questions. I don’t have complete answers for all of them, but here are some thoughts for further discussion:

There are lots of choices for activation function in the hidden layers. Here’s another recent thread about that. But it is an interesting question of whether it would ever make sense to use different activation functions at different hidden layers of the network, e.g. ReLU at the earlier layers and then more expensive functions like tanh later. In all the examples I have seen in the DLS courses, Prof Ng is always consistent in the use of hidden layer activations in any given network. So I don’t have any experience with that idea. But this is an experimental science! You could try some experiments with this idea and see if you learn anything interesting. Please let us know if you try that and what you see!
I’m not sure what you mean here. I guess there is no reason to believe that you could not come up with a scenario where two different functions give pretty similar results. Here again, I don’t know any specific examples from experience. If you try any experiments, let us know.
Prof Ng presents Logistic Regression first as a “trivial” Neural Network. The output layer of a binary classifier is identical to LR. But the point is that LR is only capable of linear decision boundaries: the solution is a hyperplane in the input space that does the best job of separating the “yes” and “no” answers. So you would expect in principle that a NN can do a better job, because it is capable of non-linear decision boundaries. But with an NN, the cost function is no longer convex, so you have the issue that you may find different local minima. So in principle the NN should do at least as well as LR once you get your hyperparameters nailed.

Topic		Replies	Views
Neural Network functions Advanced Learning Algorithms week-module-2	3	497	April 22, 2023
Using different activation functions in the hidden layer? Advanced Learning Algorithms week-module-2	2	526	July 21, 2022
Week3 - Choice of Activation function Neural Networks and Deep Learning coursera-platform	2	779	February 5, 2022
Why I need the same activation function in a layer with multiple neurons? Neural Networks and Deep Learning coursera-platform	4	1181	November 5, 2022
ReLU and sigmoid alternatives in Week 3 assignment Neural Networks and Deep Learning coursera-platform	11	929	July 20, 2022

Using different activation function for hidden layers

Related topics