A1 and D1 are supposed to have the same shape, so A1 * D1 is the element-wise multiplication where as np.dot(A1, D1) is a matrix multiplication when they are 2-D. The two multiplications are two different operations and only one of them is what we need. The answer is we want element-wise multiplication because our objective is to zero out those dropped units.

D1 is like a mask and it has the same shape as A1 because D1 tells you which element in A1 you want to zero out, so elements in D1 and elements in A1 are one-one corresponded. That’s why you want element-wise multiplication.

np.dot and elementwise multiplication are completely different operations. You have to use the one that is appropriate for what you are doing. In the case of dropout, we are randomly zeroing different elements in each column of the activation matrix, so computing a Boolean “mask” the same shape as the activation and then doing an elementwise multiply does exactly what we need. np.dot involves taking the dot product of one row of the first operand with one column of the second operand. How wouid that be useful for “zapping” neurons as we need to do in the dropout case?

Of course, dropout is one step in forward propagation and forward propagation does involve dot products. Everything always goes back to the math: what are you trying to do mathematically and then what linear algebra operations express what it is that the math formula says.