If you use the same cost function and model and train two different models one using relu and one using tanh for the hidden layers given enough iterations: (assuming final output activation function is the same for both)
Would you still end up with the same parameters for the final model because the cost function is the same or does the choice of hidden layer activations affect the final parameters?