It is great that you are trying this type of experiment! You always learn something interesting when you extend the material and try things like this. It is possible to get almost as good performance using ReLU, but it requires a lot more than 4 neurons in the hidden layer. It is great that you experimented with the other hyperparameters like learning rate and number of iterations. There is no guarantee that the same combination that worked well with tanh will work with the others. There is actually quite a close relationship mathematically between tanh and sigmoid, so I would expect you could also get essentially the same results with a little tweaking of the learning rate and number of iterations. Here’s a thread about the relationship between tanh and sigmoid.

But ReLU is a different matter. Here’s an earlier thread that gives some results other students have gotten applying ReLU to this problem.

Thanks for sharing the results of your experiments! This is all an experimental science!