General Question

paulinpaloalto · March 3, 2022, 5:36pm

It’s an interesting question. Trig function are most likely not useful. Using a periodic function means you’re saying that lots of different inputs that may be far apart have the same result. How would that be useful in this case? Notice that most of the activation functions we have seen are monotonic non-decreasing. I don’t think that is strictly necessary, since swish is commonly used and has one region where it decreases a bit.

Note that log is monotonic increasing, but it can’t handle negative inputs. So that points out another characteristic that we need: the domain of the function needs to be (-\infty, \infty).

Also note that the activation for the output layer is well defined: for a binary classifier, we need sigmoid, because a) we need the output to look like the probability of “yes” and b) sigmoid and the cross entropy loss function are tied together. Then for multi-class classifiers we use softmax, which you can think of as the generalization of sigmoid and the cross entropy loss function works with softmax also.

In the hidden layers, we can choose whatever works from experience. That is the high level point: what we’re seeing is the result of a lot of years of experimentation and these are the functions that have been found to work. But this is an experimental science: if you have some new ideas, give them a try and see what happens. Maybe you’ll find something new that works even better. Write the paper and it’ll be your name in lights!

Here’s a thread which talks about how the choice of hidden layer activation works.

Just on the general topic, here’s a thread about the fact that tanh and sigmoid are actually quite closely related mathematically.

Topic		Replies	Views
Why ReLU and softmax? NLP with Probabilistic Models week-4	1	607	November 2, 2021
Week3 - Choice of Activation function Neural Networks and Deep Learning coursera-platform	2	754	February 5, 2022
Activation functions in the hidden layers Advanced Learning Algorithms week-2	4	510	July 21, 2022
First binary classification model Neural Networks and Deep Learning coursera-platform	5	563	July 12, 2022
Better Activation functions: (tanh > sigmoid) MLS Resources	18	1057	November 10, 2022

General Question

Related topics