It’s a great idea to experiment with using ReLU as the hidden layer activation in this exercise. You always learn something interesting when you try to extend the ideas in the course. Of course you will need to change more than just the forward prop logic: the derivatives of the activation functions affect the back prop as well.
There have been a couple of other threads about this in the past, e.g. this one. I was able to get pretty good accuracy using ReLU but it requires a lot more hidden units to get results equivalent to what you can get with tanh and 4 hidden units.