Hello, for the following activation function (excuse my poor notation)…
g(w dot x + b), where g is the sigmoid function
… is the activation function really just the sigmoid function (as I’ve heard it referred to)? Or does the activation function refer to the full composite function (i.e. it also includes the sigmoid function’s input function of w dot x + b)?
Thank you, but my question is what the exact definition of “activation function” is. Using the example you gave, is the activation function strictly the sigmoid part of it: 1/(1+exp(-z)), where z can be anything, OR is the activation function the sigmoid function with its inputs specified: 1/(1+exp(-(w*x + b)))
Yes! The activation function is just the sigmoid part of it, and z can be anything. z = w dot x + b is the output of a layer, and we apply an activation function to that output.
My understanding (so far) for, say, a dense layer with 2 units, is that the “output” of the layer (also called an “activation vector”?) would be calculated:
Each neural unit includes the activation function.
Indeed. My latest comment was asking for an explanation of why rmwkwok described
z = w dot x + b
as the output of a layer. My understanding is that the output of a layer is computed using that equation AND the activation function. Alternatively, if looking at just z = w dot x + b for a layer output, it would simply be the x (that’s the output of a previous layer). Either way, z = w dot x + b doesn’t describe the output of a layer as far as I can see.
we can see that a Dense layer can come without an activation (which is equivalent to having “linear” as its activation).
Furthermore, we can also find this standalone activation layer in the package.
Sometimes, people consider activation as part of a dense layer, and sometimes, people consider them two separate layers. That explains my choice of words, but both choices are understandable to me
Thanks Raymond! The tf Dense class was part of my confusion - in the course I’m taking there is an example that uses the Dense class and sets activation to “linear”, but described the layer as having no activation function. Thanks for clarifying using that exact example! It’s too bad that having no activation function is sometimes referred to as linear activation, both in speech and in the TensorFlow library, but now I know! (And I guess it kind-of makes sense.) Also the rest of your comment really helped add context. Cheers
Follow up, I finally learned why a linear activation function is technically different from, but equivalent to, no activation function. Linear activation is f(x) = x, i.e. no change. Sweet, it all makes sense. Thank again for your help.
Follow up: I learned the linear activation function is f(x) = x. So performing linear activation doesn’t change anything. That’s why “no activation function” and “linear activation function” are used interchangeably - they produce equivalent results.