For RMSProp, the lecture notes says
W = W - learning_rate *dW / (sqrt( SdW + epsilon))
The lab says:
W = W - learning_rate *dW / (sqrt( SdW ) + epsilon)
Are both acceptable to use in practice?
The goal of adding epsilon is to avoid a division-by-zero math error.
The lecture notes are incorrect.
Thanks for the clarification.