The transpose is needed because Prof Ng chooses the convention that all standalone vectors are formatted as column vectors. So both the weights vector w and the input vector x are n_x x 1 column vectors, where n_x is the number of features (elements) in each input vector.

Then the key is that we need to implement the following mathematical formulas in two steps. First there is the linear combination of w, x and the bias b:

z = \displaystyle \sum_{i = 1}^{n_x} (w_i * x_i) + b

Then we apply the non-linear sigmoid activation function to get the final output of logistic regression:

\hat{y} = \sigma(z)

So then the question is how to express that first linear combination (really “affine” transformation) to compute z using vector operations for efficiency. The easiest way is to write that sum of the products formula as a dot product:

z = w^T \cdot x + b

The way dot products work is that the inner dimensions need to agree. Both w and x are n_x x 1, so if we transpose w we have w^T is 1 x n_x vector. If you then dot 1 x n_x with n_x x 1, you end up with a 1 x 1 or scalar result, which is what we want. If you think about what “dot product” means, it is exactly that sum of the products of w_i * x_i for each pair of elements in the two vectors that is shown in the math formula above. But we need the transpose in order for the operation to work when the vectors have those dimensions.

But notice that then we can take one more step in vectorizing by concatenating m input x vectors to make an input matrix X which is now n_x x m (one column for each sample). Now you can compute all the individual \hat{y} values at once by doing this:

Z = w^T \cdot X + b

So we have 1 x n_x dot n_x x m, which gives us a 1 x m output. Then we get:

\hat{Y} = \sigma(Z)