In dropout why we want to maintain expected value of a[l]

The key point to realize is that all forms of regularization (dropout, L2, Lasso …) only happen at training time, not at test time or in normal inference mode (prediction). So if you train without the reverse scaling, the later layers after the dropout layers will not be trained to expect the same amount of “energy” from the previous layers.

This question has been asked a number of times before. Here’s a good thread to read, which also points to this one.