The input (a_in) to each neuron is of shape (400,1), and the weight matrix (w) for each neuron is (400, 25). So when we do z = np.dot(w, a_in), the multiplication should be possible only if column size of first matches row size of second. However, that’s not the case here. And we are also not transposing the matrix. But the my_dense function is still working correctly. So how is that happening?

The confusion here seems to be about the dimensions of the matrices involved in the operation np.dot(w, a_in) within the my_dense function. The operation is indeed possible and correct given the dimensions of the matrices.

In the context of neural networks, a_in is the activation from the previous layer (or the input layer if it is the first hidden layer) and w is the weight matrix for the current layer.

a_in has the shape (400, 1), which means it is a column vector with 400 elements.

w has the shape (400, 25), which indicates that there are 25 neurons in the current layer, and each neuron has 400 weights corresponding to the 400 inputs.

When performing the operation np.dot(w, a_in), w should be transposed to match the inner dimensions for matrix multiplication. Typically, the weight matrix w would be of the shape (25, 400), so that when it is multiplied by a_in (of shape (400, 1)), the inner dimensions (400) match, and the resulting matrix is of the shape (25, 1), which is the activations of the current layer.

If the my_dense function is working correctly without explicitly transposing the matrix, then it’s likely that the weight matrix w is already defined in the transposed form (25, 400) in the function call. If that’s the case, then np.dot(w, a_in) would indeed give the correct result since the inner dimensions (400) match, resulting in a (25, 1) shape output corresponding to the activations of the 25 neurons in the current layer.

In summary, for the matrix multiplication to work without transposing, the weight matrix w must be defined as (number of neurons, number of inputs) which in your case would be (25, 400) and not (400, 25) as you have mentioned. There might be a misunderstanding in the shape description or the function is using the transposed weight matrix.

I do not think that is true.
a1 must be the size of the 1st hidden layer, which as 25 units.
a2 must be the size of the 2nd hidden layer, which has 15 units.