If I understand correctly, activation functions serve as a boundary of sorts that classify inputs based on the output of the function. Would it be feasible to use a planar activation function that is something like sigmoidal on the x and y axes, and then ReLU on the x and z axes? Something like differentiating between “cat” or “dog” with the sigmoidal function and then classifying the species of cat or dog with the ReLU comes to mind as a possible application. Could multiple layers be condensed by higher dimensional activation functions?
Hi @rwjohnson42 in my understanding the one purpose of the activation function is to introduce a non linearity in the architecture of a neural network. This is essential because, without non linearities, any architecture which is a combination of linear transofrmation can be algebrically simplified to a single linear relation between inputs and outputs. Given the complexity and the size of the deep nn it is convenient to have activation functions with some characteristics like being simple to differentiate (plus many other). I think that the purpose of who conceived the deep nn architeture was to have a large number of not too complicated blocks. Neverthanless this does not mean there can be other architectures and approaches. Anyhow I am not sure I understand how the proposed higher dimensional function would look like in a neural network structure
Hi @rwjohnson42. You will also be encountering “multinomial classification” later in the Specialization, in which the network will be tasked with discriminating between multiple classes (e.g. cat, dog, human, neither). Those neural networks use a simple generalization of the logistic (sigmoidal) function in the output layer. They do not involve mixing different types of activations across classes. Remember, at least for classification tasks, we want the activation function to deliver a probability, which ReLU will not do.
A very interesting speculation on your part. Your curiosity will be rewarded down the line!
Thank you for the info. I may be getting ahead of myself, and it may unnecessarily complicate the network. I do not understand how it would look either. I may revisit it when I have learned more to see if it could be of any value.
I understand. Thank you for the information! It may be something to revisit for tasks other than classification, but I have not learned that much yet.
I am curious as to how it will be rewarded!