When talking about RMSprop in the video, Professor Wu talked about how it could damp out the oscillations into the vertical direction(db), and move toward the horizontal direction(dw) faster. But what if the dw was big enough and the db is pretty small? Would it slow down horizontal learning, and cause more vertical oscillations? That’s not what we want, right?
You are correct. That is why we usually will normalize the features, so that the magnitudes are all similar.