In Inverted dropout, we want some units to be zeroed out so that the complexity of the Neural Network decreases.
After you multiply a3 with d3(A random boolean matrix where elements are less than keep_prob)
, you get a matrix a3 with some elements randomly zeroed out which means the zeroed element position indicates that particular hidden unit is eliminated.
But the reason behind scaling (a3/=0.8) is that every value in the matrix is being affected and that should not be the case
Please correct me If I am wrong!