Hello learners,
There are mainly 2 types of error:
- syntax error: our implemented function can’t run completely, and doesn’t produce any output
- non syntax error (run time error): our implemented function run through and produce output, but the output is not as expected
Non syntax error
I suggest we use (1) simple inputs and (2) printing variables to check our implemented function. To illustrate the idea, I edited a provided function by an optional lab to make it wrong, and here it is:
Situation: implement the gradients defined as the following formula:
Wrong code:
1 def compute_gradient(x, y, w, b):
2 """
3 Computes the gradient for linear regression
4 Args:
5 x (ndarray (m,)): Data, m examples
6 y (ndarray (m,)): target values
7 w,b (scalar) : model parameters
8 Returns
9 dj_dw (scalar): The gradient of the cost w.r.t. the parameters w
10 dj_db (scalar): The gradient of the cost w.r.t. the parameter b
11 """
12
13 # Number of training examples
14 m = x.shape[0]
15 dj_dw = 0
16 dj_db = 0
17
18 ### START CODE HERE ###
19 for i in range(m):
20 f_wb = w * x[i] + b
21 dj_dw_i = f_wb - y[i] # <-- wrong line, missing "* x[i]"
22 dj_db_i = f_wb - y[i]
23 dj_db += dj_db_i
24 dj_dw += dj_dw_i # <-- wrong line, missing indentation
25 dj_dw = dj_dw / m
26 # <-- missing line "dj_db = dj_db / m
27
28 if dj_db + 3 > 0: # <-- unneeded line
29 dj_db = 0 # <-- unneeded line
30 ### END CODE HERE ###
31
32 return dj_dw, dj_db
Debugging steps:
Creating inputs
- We create a set of simple input. By Line 1, we know there are 4 inputs:
x, y, w, b
. By Line 5 to 7, we know what they should look like. - By Line 5,
x
is an array of size(m, )
. Because we want it simple, let’s createx
with 3 examples (m = 3
):x = np.array([1, 2, 4])
- By Line 6,
y
is an array of size(m, )
. Let’s createy = np.array([2, 1, 3])
- By Line 7,
w
andb
are scalars, so let’s createw = 2
,b = 1
. - Note that we use simple (but NON-ZERO) integers such as 1, 2, 3, and 4 so they are easy to work with.
- Created inputs:
x = np.array([1, 2, 4])
y = np.array([2, 1, 3])
w = 2
b = 1
Adding print lines
- add print lines only between
### START CODE HERE ###
and### END CODE HERE ###
- print variables every time AFTER they are assigned a new value
- print the condition of BEFORE every
if
statement - print the returning variables
- add a number to identify which printed result refers to which print line. add the variable name also if preferred.
- After adding print lines:
1 def compute_gradient(x, y, w, b):
2 """
3 Computes the gradient for linear regression
4 Args:
5 x (ndarray (m,)): Data, m examples
6 y (ndarray (m,)): target values
7 w,b (scalar) : model parameters
8 Returns
9 dj_dw (scalar): The gradient of the cost w.r.t. the parameters w
10 dj_db (scalar): The gradient of the cost w.r.t. the parameter b
11 """
12
13 # Number of training examples
14 m = x.shape[0]
15 dj_dw = 0
16 dj_db = 0
17
18 ### START CODE HERE ###
19 print(1, "m", m) # <-- ADDED
20 for i in range(m):
21 print(2, "i", i) # <-- ADDED
22 f_wb = w * x[i] + b
23 print(3, "f_wb", f_wb) # <-- ADDED
24 dj_dw_i = f_wb - y[i] # <-- wrong line, missing "* x[i]"
25 print(4, "dj_dw_i", dj_dw_i) # <-- ADDED
26 dj_db_i = f_wb - y[i]
27 print(5, "dj_db_i", dj_db_i) # <-- ADDED
28 dj_db += dj_db_i
29 print(6, "dj_db", dj_db) # <-- ADDED
30 dj_dw += dj_dw_i # <-- wrong line, missing indentation
31 print(7, "dj_dw", dj_dw) # <-- ADDED
32 dj_dw = dj_dw / m
33 print(8, "dj_dw", dj_dw) # <-- ADDED
34 # <-- missing line "dj_db = dj_db / m
35
36 print(9, "dj_db + 3", dj_db + 3) # <-- ADDED - print condition of `if` statement
37 if dj_db + 3 > 0: # <-- unneeded line
38 dj_db = 0 # <-- unneeded line
39 print(10, "dj_db", dj_db) # <-- ADDED
40 print(11, "dj_dw, dj_db", dj_dw, dj_db) # <-- ADDED - print returning variables
41 ### END CODE HERE ###
42
43 return dj_dw, dj_db
Inspect the code with the printings
- Run the code below
x = np.array([1, 2, 4])
y = np.array([2, 1, 3])
w = 2
b = 1
compute_gradient(x, y, w, b)
- Receive the output below
1 m 3
2 i 0
3 f_wb 3
4 dj_dw_i 1
5 dj_db_i 1
6 dj_db 1
2 i 1
3 f_wb 5
4 dj_dw_i 4
5 dj_db_i 4
6 dj_db 5
2 i 2
3 f_wb 9
4 dj_dw_i 6
5 dj_db_i 6
6 dj_db 11
7 dj_dw 6
8 dj_dw 2.0
9 dj_db + 3 14
10 dj_db 0
11 dj_dw, dj_db 2.0 0
-
Inspect the output together with the formula, by first calculating the expected outcome (here comes the benefit of using simple integers):
Formula:
Expected outcomes:
\frac{\partial{J}}{\partial{w}} = \frac{1}{3}[(2\times1+1-2)\times1+(2\times2+1-1)\times2+(2\times4+1-3)\times4] = 11
\frac{\partial{J}}{\partial{b}} = \frac{1}{3}[(2\times1+1-2)+(2\times2+1-1)+(2\times4+1-3)] = 3.66667Inspection work below:
1 m 3 # OK, because my created input has 3 samples
2 i 0 # OK, because i is iterated from 0 up to 2
3 f_wb 3 # OK, because the model is w * x + b, and for the first sample, it is 2*1+1=3
4 dj_dw_i 1 # OK, because by the formula, it should be (f - y) * x, so (3 - 2) * 1 = 1
5 dj_db_i 1 # OK, because by the formula, it should be (f - y) , so (3 - 2) = 1
6 dj_db 1 # OK, the purpose of this line is to accumulate dj_db_i, it should be 0 + 1 = 1
# WRONG !!! expecting a line to accumulate dj_dw_i, but there is none!
2 i 1 # OK, because i is iterated from 0 up to 2
3 f_wb 5 # OK, 2*2+1=5
4 dj_dw_i 4 # WRONG!!!! expecting (5 - 1) * 2 = 8
5 dj_db_i 4 # OK, (5 - 1) = 4
6 dj_db 5 # OK, (4 + 1) = 5
2 i 2 # OK, because i is iterated from 0 up to 2
3 f_wb 9 # OK, 4*2+1=9
4 dj_dw_i 6 # WRONG!!!! expecting (9 - 3) * 4 = 24
5 dj_db_i 6 # OK, (9 - 3) = 6
6 dj_db 11 # OK, (5 + 6) = 11
7 dj_dw 6 # WRONG!!! The loop has ended, and this variable should have accumulated all dj_dw_i which is 1 + 8 + 24 = 33
8 dj_dw 2.0 # WRONG!!! Because by the formula, it should be 33/3 = 11
# WRONG!!!! expecting a line to divide dj_db with m, but there is none. it should be 11/3 = 3.666667
9 dj_db + 3 14 # 14 is greater than 0, so the `if` statement is triggered
10 dj_db 0 # WRONG!!! `dj_db should be 3.666667`
11 dj_dw, dj_db 2.0 0 # WRONG!!! expecting 11 and 3.666667. ALSO, according to the code's Line 9 and 10, they should both be scalar which is correct.
Correct the code based on the finding
1 def compute_gradient(x, y, w, b):
2 """
3 Computes the gradient for linear regression
4 Args:
5 x (ndarray (m,)): Data, m examples
6 y (ndarray (m,)): target values
7 w,b (scalar) : model parameters
8 Returns
9 dj_dw (scalar): The gradient of the cost w.r.t. the parameters w
10 dj_db (scalar): The gradient of the cost w.r.t. the parameter b
11 """
12
13 # Number of training examples
14 m = x.shape[0]
15 dj_dw = 0
16 dj_db = 0
17
18 ### START CODE HERE ###
19 print(1, "m", m) # <-- ADDED
20 for i in range(m):
21 print(2, "i", i) # <-- ADDED
22 f_wb = w * x[i] + b
23 print(3, "f_wb", f_wb) # <-- ADDED
24 dj_dw_i = (f_wb - y[i]) * x[i] # <-- wrong line, missing "* x[i]"
25 print(4, "dj_dw_i", dj_dw_i) # <-- ADDED
26 dj_db_i = f_wb - y[i]
27 print(5, "dj_db_i", dj_db_i) # <-- ADDED
28 dj_db += dj_db_i
29 print(6, "dj_db", dj_db) # <-- ADDED
30 dj_dw += dj_dw_i # <-- wrong line, missing indentation
31 print(7, "dj_dw", dj_dw) # <-- ADDED
32 dj_dw = dj_dw / m
33 print(8, "dj_dw", dj_dw) # <-- ADDED
34 dj_db = dj_db / m # <-- missing line "dj_db = dj_db / m
35 print(8.1, "dj_db", dj_db) # <-- ADDED
36
37 print(11, "dj_dw, dj_db", dj_dw, dj_db) # <-- ADDED - print returning variables
38 ### END CODE HERE ###
39
40 return dj_dw, dj_db
Run the code again, and get the following outputs
1 m 3
2 i 0
3 f_wb 3
4 dj_dw_i 1
5 dj_db_i 1
6 dj_db 1
2 i 1
3 f_wb 5
4 dj_dw_i 8
5 dj_db_i 4
6 dj_db 5
2 i 2
3 f_wb 9
4 dj_dw_i 24
5 dj_db_i 6
6 dj_db 11
7 dj_dw 33
8 dj_dw 11.0
8.1 dj_db 3.6666666666666665
11 dj_dw, dj_db 11.0 3.6666666666666665 # as EXPECTED!
Remove the print lines and any other code for this inspection work so they won’t interfere with the grader
1 def compute_gradient(x, y, w, b):
2 """
3 Computes the gradient for linear regression
4 Args:
5 x (ndarray (m,)): Data, m examples
6 y (ndarray (m,)): target values
7 w,b (scalar) : model parameters
8 Returns
9 dj_dw (scalar): The gradient of the cost w.r.t. the parameters w
10 dj_db (scalar): The gradient of the cost w.r.t. the parameter b
11 """
12
13 # Number of training examples
14 m = x.shape[0]
15 dj_dw = 0
16 dj_db = 0
17
18 ### START CODE HERE ###
19 for i in range(m):
20 f_wb = w * x[i] + b
21 dj_dw_i = (f_wb - y[i]) * x[i]
22 dj_db_i = f_wb - y[i]
23 dj_db += dj_db_i
24 dj_dw += dj_dw_i
25 dj_dw = dj_dw / m
26 dj_db = dj_db / m
27 ### END CODE HERE ###
28
29 return dj_dw, dj_db
Good luck!
Raymond