Hello everyone,

I am going through the DL course 1. Because of some previous background the math and algorithms are fine to my brain, but there is a practical, very important concept that I can’t easily wrap my head around.

How do you know which activation function makes sense to a problem? I have read that, in the end a multilayer NN could probably approximate any data reasonably well, but some could take longer.

Yet I am interested in either a single layer, or a multilayer but where you are trying to use some mathematical intuition to know which function to use.

In broad sense it seems that people would use maybe a linear activation function for a is for a continuous value output (prices, weights, coordinates) without caring whether it is line-like or not (see previous paragraph).

They would use a sigmoid function for a binary or maybe multi classification problem for restriction 0-1 convenience etc.

Is there any video, page, blog, book, resource (maybe your own experience) you would recommend to see some examples, and how more knowledgeable people think about these functions, that is not too complicated for a beginner?

I am aware that practice will show me a lot of tricks for it, but I’d like to see a what a more educated person says about it.

Thanks.