I’m going to make some assumptions because I’m not super sure which is the full context of these statements (ie. what lessons in particular).
From statement #1: So momentum might not be very useful or have negligible impact on small datasets, or when the learning rate is very small. I believe you can look at it this way… Momentum helps you converge faster by adding a boost to the gradient step. However if the gradient step is tiny, the boost will also still be tiny, hence the negligible in the statement.
Check this article that I find interesting: https://distill.pub/2017/momentum
About statement#2: This just refers to Adam optimization being very effective out of the box. Adam uses some clever tricks in order to choose a good step size on a per-parameter (ie. weights) and how quickly it is changing, instead of relying only on one, static learning rate such as alpha.