Hi! While I understand how to implement forward propagation algorithm ,as it was described in machine learning specialization course 2, I found it difficult to understand how exactly our model is learning proper weights? First we generate some random weights to start with something. Then we feed our first neuron with them and with X. Next step is to apply activation function in case of my scenario it is sigmoid. We repeat over and over. Where exactly is the learning part? I suppose it’s about how sigmoid function works but I might be wrong. Thanks and wish you all the best.
My attempt to explain in a mostly non-mathematical way…
- Run forward propagation and produce an output value (prediction)
- Compute error between prediction and known correct value. (Loss Function)
- Make a small change in each weight in the direction that you think will reduce total error. (Backwards Propagation)
- Repeat
This process of iterating many times to gradually reduce error between predicted and correct known value we anthropomorphize as learning.
Note that math powers both the ‘compute the error’ and the ‘in the direction of reducing total error’ parts, and to really understand what learning means, you need to read and comprehend those equations/expressions.
EDIT: @TMosh uses the important word gradient below, which is part of the ‘in the direction of reducing error’ computation. Learning is a much more accessible way of describing this process than iterative first-order optimization of a locally differentiable function
Using TensorFlow or sklearn, the learning happens behind the scenes. When you fit a model, in the background the layers are computing the gradients based on the results of forward propagation as @ai_curious described. This learning process (updating the initial weights) is the step 3 in his reply.
The same process is discussed in the earlier portions of the MSL course, where you implement functions that are usually called “compute_gradients” and “update_parameters()”.
If you haven’t reached that point yet, you’ll get there soon.
The thing is code which was introduced as ,from scratch neural network" is looking like that:
def my_dense(a_in, W, b):
“”"
Computes dense layer
Args:
a_in (ndarray (n, )) : Data, 1 example
W (ndarray (n,j)) : Weight matrix, n features per unit, j units
b (ndarray (j, )) : bias vector, j units
Returns
a_out (ndarray (j,)) : j units|
“”"
units = W.shape[1]
print(f"units {units}“)
a_out = np.zeros(units)
for j in range(units):
w = W[:,j]
print(f"w {w}”)
z = np.dot(w, a_in) + b[j]
a_out[j] = g(z) # g(z) is just a sigmoid function
return(a_out)
def my_sequential(x, W1, b1, W2, b2):
a1 = my_dense(x, W1, b1)
a2 = my_dense(a1, W2, b2)
return(a2)
def my_predict(X, W1, b1, W2, b2):
m = X.shape[0]
p = np.zeros((m,1))
for i in range(m):
p[i,0] = my_sequential(X[i], W1, b1, W2, b2)
return(p)
I don’t understand where is gradient in this code. It’s just passing inputs computed with sigmoid g(z) function to next layers, without performing any gradient.
Nevermind i just found out that above this code we generate weights with keras model and then we pass them to our code. Thanks for help!
What your code fragment doesn’t include is any training loop. This is where the iterative computation of prediction, error, and gradient occurs. As shown, your model isn’t doing any learning and your observation is entirely correct…it’s just making a prediction on one static set of weights and inputs.