Hi, I tried substituting a ReLU function instead of a tanh activation using
def relu(x): return np.maximum(0, x)
However it is unable to calculate cost:
How would this be fixed? The only thing I did was to define relu and call it instead of tanh. 
It’s great that you are trying this type of experiment. You always learn something interesting when you go beyond the exercise. ReLU works fine as the hidden layer activation in this assignment, but the point is you don’t just have one change to make to implement that, right? Remember that it affects back propagation also.