In addition to 2)
If Adam does not converge well, AMSGrad might be worth a look, see also:
https://johnchenresearch.github.io/demon/
Here also some other algorithms are explained like QHM (Quasi-Hyperbolic Momentum) which decouples the momentum term from the current gradient when updating the weights which can also be beneficial!
Best regards
Christian