Matrix Calculus

I have some problems about Matrix Calculus:
1.I want to know that Is the Jacobi matrix by definition such or can it be expressed as a transpose


is it right in that way(please look this picture)
image

2.if it can be expressed as a transpose.I want to know that in such case’s answer,please look at this picture

Thanks very much

1 Like

That is beyond the scope of this course. Here’s a thread which has some links that are relevant.

What you are probably asking about is the fact that because he doesn’t want to cover the derivation and all this matrix calculus material, Prof Ng takes a convenient shortcut that helps simplify translating Gradient Descent into code: he uses the convention that the gradient of a vector or matrix has the same shape and orientation as the base object. That makes writing the “update parameters” logic simple, but (as I think you are pointing out) that’s not really how things work if you really do the full mathematical version of all this. The “pure math” expression of all this is that the gradient of the object ends up being transposed from the shape of the base object.

I suppose confusions came from how we define Nabla (or Del or Gradient), \nabla, which is sometimes represented by a row vector, and sometimes by a column vector. Let’s exclude Nabla, at first, and focus on Jacobian.

\textbf{y} = \begin{bmatrix} f_1(\textbf{x})\\ f_2(\textbf{x})\\ f_3(\textbf{x})\\ : \\ f_m(\textbf{x}) \end{bmatrix} = \begin{bmatrix} f_1(x_1, x_2, x_3, ...., x_n) \\ f_2(x_1, x_2, x_3, ...., x_n) \\ f_3(x_1, x_2, x_3, ...., x_n) \\ : \\ f_m(x_1,x_2,x_3, ...., x_n) \end{bmatrix}

Then, Jacobian is as you wrote,

\textbf{J} = \begin{bmatrix} \frac{\partial f_1}{\partial x_1} & \frac{\partial f_1}{\partial x_2} & \frac{\partial f_1}{\partial x_3} & ... & \frac{\partial f_1}{\partial x_n} \\ \frac{\partial f_2}{\partial x_1} & \frac{\partial f_2}{\partial x_2} & \frac{\partial f_2}{\partial x_3} & ... & \frac{\partial f_2}{\partial x_n} \\ \frac{\partial f_3}{\partial x_1} & \frac{\partial f_3}{\partial x_2} & \frac{\partial f_3}{\partial x_3} & ... & \frac{\partial f_3}{\partial x_n} \\ : \\ \frac{\partial f_m}{\partial x_1} & \frac{\partial f_m}{\partial x_2} & \frac{\partial f_m}{\partial x_3} & ... & \frac{\partial f_m}{\partial x_n} \end{bmatrix}

To understand what \nabla is, let’s simplify this with m=1. In this case, Jacobian can be written as follows. It is n-dimensional row vector.

f^{'}(\textbf{x}) = \begin{bmatrix} \frac{\partial f}{\partial x_1} & \frac{\partial f}{\partial x_2} & \frac{\partial f}{\partial x_3} & ... & \frac{\partial f}{\partial x_n} \end{bmatrix}

Then, gradient can be represented as \nabla f(x), which is;

\nabla f(\textbf{x}) = f^{'}(\textbf{x})^{T} = \begin{bmatrix} \frac{\partial f}{\partial x_1} \\ \frac{\partial f}{\partial x_2} \\ \frac{\partial f}{\partial x_3} \\ : \\ \frac{\partial f}{\partial x_n} \\ \end{bmatrix}

Yes, this is a column vector, not a row vector. But, many persons write this as a row vector or say does not matter. Actually, that may be right in most of cases. But, again, I suppose confusions come from here.
I believe, from a mathematical view point, if we want to describe Jacobian equation with \nabla, it would be;

\textbf{J} = \begin{bmatrix} \nabla f_1(\textbf{x})^{T} \\ \nabla f_2(\textbf{x})^{T} \\ \nabla f_2(\textbf{x})^{T} \\ .... \\ \nabla f_m(\textbf{x})^{T} \end{bmatrix} = \begin{bmatrix} \frac{\partial f_1}{\partial x_1} & \frac{\partial f_1}{\partial x_2} & \frac{\partial f_1}{\partial x_3} & ... & \frac{\partial f_1}{\partial x_n} \\ \frac{\partial f_2}{\partial x_1} & \frac{\partial f_2}{\partial x_2} & \frac{\partial f_2}{\partial x_3} & ... & \frac{\partial f_2}{\partial x_n} \\ \frac{\partial f_3}{\partial x_1} & \frac{\partial f_3}{\partial x_2} & \frac{\partial f_3}{\partial x_3} & ... & \frac{\partial f_3}{\partial x_n} \\ : \\ \frac{\partial f_m}{\partial x_1} & \frac{\partial f_m}{\partial x_2} & \frac{\partial f_m}{\partial x_3} & ... & \frac{\partial f_m}{\partial x_n} \end{bmatrix}

Yes,your oppion is so great.