Inconsistent Gradient Checking algorithm question

I am debugging the assignment W1A3 Gradient_Checking. What I notice is that the formula for gradient checking is different in the course notes and on the assignement.

The notes/slide show the formula as || dtheta_approx - dtheta || (subscript)2 / || dtheta_approx|| (subscript)2 - || dtheta|| (subscript)2

or (in Latex)

$$ difference = \frac {\mid\mid grad - gradapprox \mid\mid_2}{\mid\mid grad \mid\mid_2 + \mid\mid gradapprox \mid\mid_2} \tag{2}$$

vs.

$$ difference = \frac {\mid\mid gradapprox - grad \mid\mid_2}{\mid\mid gradapprox \mid\mid_2 + \mid\mid grad \mid\mid_2} \tag{2}$$

However,

the assignment describes and reverses the elements of the formula….as

|| dtheta - dtheta_approx || (subscript)2 / || dtheta|| (subscript)2 - || dtheta_approx|| (subscript)2

While I understand this is almost…a Euclidean distance formula, I think the results may differ for each implementation, but I am not truly sure. So I have two questions:

  1. Does the order matter?
  2. Does the use of the “subscript 2” in the formulas (both in the slides and the assignment) mean “raised to the power of 2”? If so, shouldn’t they be a superscript? and if not, why not?

Thanks for any guidance.

The forums support LaTeX as described on the DLS FAQ Thread. Using your formulas with the local syntax for LaTeX for clarity gives:

difference = \frac {\mid\mid grad - gradapprox \mid\mid_2}{\mid\mid grad \mid\mid_2 + \mid\mid gradapprox \mid\mid_2} \tag{2}

Versus

\frac{|| dtheta - dthetaapprox ||_2}{ || dtheta||_2 - || dthetaapprox||_2}

The second formula with the subtraction in the denominator is incorrect. The course notes are not really maintained, but I will report that as a bug.

The first formula is correct and that is what is shown in the assignment. The order does not matter because addition is commutative and in the case of the numerator, we are taking the norm of the difference so it wouldn’t matter there if you reversed the operands.

The subscript 2 means that the norms are the “2-norm” which is the Euclidean distance.

Thanks for the quick response, Paul!

Could you clarify “2-norm” in this instance? How do I implement that whatever that means? If in fact the subscript 2 acts as an operand….Thanks. In Euclidean, as I understand it, it would mean sum of squares, so are you sating it acts as raised to the power of 2?

Here is the screen shot of that page from the course notes:


So the denominator is correct and is not what you showed. The numerator has the subtraction reversed, but that doesn’t matter because we are taking the norm which is the multi-dimensional equivalent of taking an absolute value. So the order doesn’t matter.

The 2-norm is the square root of the sum of the squares of the elements of the vector.

||v||_2 = \sqrt {\displaystyle \sum_{i = 1}^n v_i^2}

which is the Euclidean length of the vector.

In numpy, you implement that using np.linalg.norm.

Here’s what google has to say:

Thanks Paul. So it seems the subscript “2“ has massively overloaded meaning when coupled with absolute value….||x||2 …. thanks. I did not know that, perhaps I missed that in the lecture. Thank you so much for the help!

The double-vertical bar isn’t an absolute value operator. It’s the “Norm” operator.

Thanks you!

That solved my issue, I replaced abs() with np.linalg.norm(). Thanks so much for the help!

1 Like

Professor Ng mentions that at about 3:44 in the lecture on Gradient Checking. Here’s a screenshot of the transcript at that point: