I just had a general question. So by backpropagation and gradient descent, we tweak the weights and bias to minimize the average error over all training examples. Are there any techniques that somehow use backpropagation and gradient descent to tweak the activation functions? Maybe the constants used in activation functions or something like that.
Thanks