Forward propagation - Neural Network model

Hello, I have multiple questions that I would like to clarify before advancing deeper.

In the next picture, I’ve learned how a network model works and how each neuron calculates the sigmoid function.
Each neuron returns a value and the layer sends the result to the second layer but, When are the gradient descendant and the cost function executed? How each neuron interacts with each other? How do I get the best w for each feature + b?


1 Like

Hi @gmazzaglia great questions. Let’s try to go one by one

  1. When are the gradient descendant and the cost function executed?

The cost function is a measure of how wrong your model is by comparing the prediction and the actual values, so everything happens at the training phase when you make a prediction and compare with the actual output. Gradient descent is an optimization technique that helps to update weights and biases of your model by evaluating the cost function.

  1. How each neuron interacts with each other?

The interaction is made by using the output of the previous function and passing to the activation function of the next neuron. We can see it as a neuron compose by activation function, input and output. Example

NeuronA: Output: 3 → NeuronB: Input (3) → Activation function (transform the number) and provides an output. You repeat this process until you reach to the final layer.

  1. How do I get the best w for each feature + b?

You use Gradient descent and the learning rate, which are key components to find the best set of weights and biases for your data.

Let me know if this answer your question!

Hi @pastorsoto, thanks for the reply but it’s not clear from my end.

The first course explains that you start looking for the best array of weights and bias to get the best linear function (in linear regression) or If you apply a classification model, in logistic regression the best sigmoid function that will be able to ask a new value (only if you plot the model and see that a linear decision boundary is able to apply).

For instance,
You have a dataset with 2 features, so you have a table/matrix with 2 columns X1 and X2 with a lot of history data, following up the example mentioned in the course: age of the patients and size of the tumour.
With those data you train the model to ask “A new patient with age and tumour size, tell me based on a threashold of 0.5 if the tumour is malignant or benign”
How do you train it? You iterate thousand times to generate random weights and bias to get the function that is align with you history data, and during each iteration you apply derivative on each weights and bias until you get cost function approximately to zero.
When the cost function tends to zero, you have the list of weights and the bias that you can use to predict a new result, so you have the best w1, w2 and b that you apply to each feature: w1 X1 + w2 X2 + b.

So if you enter a new age and size, apply a sigmoid function with weights and bias calculated and you get true of false based on the threshold.

So far my understanding … I hope this understanding is correct :slight_smile:

My question is: When all the explanation above happens in a network model implemented with TensorFlow?


Great! Yes, you understanding is correct.

Tensorflow use an abstraction of everything that is happening here, so everything is calculated when you do the step.

This is an example of the code

# Define the model
model = tf.keras.Sequential([
    tf.keras.layers.Dense(16, activation='relu', input_shape=(num_features,)),
    tf.keras.layers.Dense(1, activation='sigmoid')

# Compile the model
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

# Train the model and store training history
history =, y, epochs=10, batch_size=32, validation_split=0.2)

In this code I defined a model and compile selecting the optimizer (we could also use Gradient descent instead of adam), the loss function (how wrong our model is), and the accuracy which we are trying to improve. Once we use it will start to calculate this by passing the data, the optimizer will try to improve the weights and bias that decrease the loss function we select.

The above picture is an example on decreasing the loss function. When you see the source code of keras for the fit method, you can see that is actually a loop that evaluates the results of the model in each iteration

keras/keras/src/backend/tensorflow/ at v3.3.3 · keras-team/keras (

Let me know if this makes sense!

This is great!
Thanks @pastorsoto

1 Like