Dear mentor,
I also am confused regarding this. Also, After reading the paper suggested by you, my understanding is that zeroing out of neurons with probability keep_prob is dome during training. But scaling by 1/keep_prob is done during the testing phase. This is what is shown in figure 2 of the mentioned paper. Please do tell me if I am missing something.