Why do we need to scale A after dropout

In week 1, prof NG mentioned we need to scale the A matrix after dropout to keep E[A] constant. I am trying to understand why do we need to keep E[A] constant in the first place.


The problem is neurons are only dropped during training. If you didn’t scale it, the values of the activations your network would see during inference would be very different from those it saw during training. I think it’s clear why that would be a problem, but let me know if it isn’t.

Good luck with course 2 :slight_smile:

1 Like