I tried to change tanh activation function with sigmoid and ReLu on hidden layers. With sigmoid and same hyperparameters the model achieved slightly lower accuracy (from 90% to 89%). With ReLu accuracy was much lower (~60%).
I even tried to reduce learning rate, extend the number of iterations and doubling the number of hidden nodes.
Had anyone similar results?
I expected better performances using ReLu.
It is great that you are trying this type of experiment! You always learn something interesting when you extend the material and try things like this. It is possible to get almost as good performance using ReLU, but it requires a lot more than 4 neurons in the hidden layer. It is great that you experimented with the other hyperparameters like learning rate and number of iterations. There is no guarantee that the same combination that worked well with tanh will work with the others. There is actually quite a close relationship mathematically between tanh and sigmoid, so I would expect you could also get essentially the same results with a little tweaking of the learning rate and number of iterations. Here’s a thread about the relationship between tanh and sigmoid.
But ReLU is a different matter. Here’s an earlier thread that gives some results other students have gotten applying ReLU to this problem.
Thanks for sharing the results of your experiments! This is all an experimental science!