How about both using either ReLU or LeakyReLU?
Why don’t you try that and see what happens? The point is that the choice of activation functions is a “hyperparameter”, meaning a choice that you need to make as the system designer. The choice depends on the circumstances, i.e. what works. So when you see a case like this, your default assumption should be that they tried using ReLU in the second case and it didn’t work very well.
Here’s a thread over in DLS that talks about the natural hierarchy of activation functions and the order in which you try them.
Thank you very much!