The particular case of `compute_total_loss`

and why the transpose is required is discussed on this thread.

Each case will be determined by the particular circumstances: how the data is formatted and what the operations being used require. One other case I can think of was in the Logistic Regression discussions in DLS C1 W2. There we needed to transpose the weight vector w in order to make the linear activation work:

Z = w^T \cdot X + b

That was because Prof Ng chooses to use the convention that standalone vectors are column vectors. So w has dimensions n_x x 1 and then because X is defined to have dimensions n_x x m in that case (also related to the previous link) we need the transpose in order for the dot product to work.