If you do Y^T \cdot log(A) that is an m x 1 vector dotted with a 1 x m vector, which gives an m x m result. That is why you needed the np.sum
there. If you did the np.dot
correctly, the sum would not be necessary. To see why your method is wrong and to understand the correct method, please see this thread.
1 Like