For RMSProp, the lecture notes says

W = W - learning_rate *dW / (sqrt( SdW + epsilon))

The lab says:

W = W - learning_rate *dW / (sqrt( SdW ) + epsilon)

Are both acceptable to use in practice?

For RMSProp, the lecture notes says

W = W - learning_rate *dW / (sqrt( SdW + epsilon))

The lab says:

W = W - learning_rate *dW / (sqrt( SdW ) + epsilon)

Are both acceptable to use in practice?

The goal of adding epsilon is to avoid a division-by-zero math error.

The lecture notes are incorrect.

1 Like

Thanks for the clarification.