Why Rescaling of Z_value on inverted dropout

It’s a good question that has been asked and answered a number of times before. Please have a look at the posts on this thread from the given post to the end and see if that covers your question.

The short summary is that you need to remember that the dropout is only happening at training time. When you actually use the network to make predictions, you just use the trained weights. If you don’t compensate for the dropout in that way during the training, then the later layers of the network are trained based on expecting less “energy” than they are actually getting from the previous layers when you actually make predictions.