Professor Andrew said that he thinks of NN as a bit of logistic regression, this is why we ended up using the w’s and b’s and g(z).
.
.
.
However where do we use the concepts such as loss and cost function in order to minimize the error from our actual y? I mean from the next figure, first I would have guess that a3 would be a scalar not a vector, second I would think that we would use the whole process of logistic regression in order to predict either 1 or 0. Can someone help me with this?
.
.
.
Didn’t watch the course video so my reply might be very general:

Loss Function: for a typical handwritten digit recognition problem, we treat it as a classification problem. Cross Entropy is usually used as the loss function. You can read more here: A Gentle Introduction to CrossEntropy for Machine Learning  MachineLearningMastery.com

Output layer: It needs to be a vector. The number of elements in this vector is the same as the number of digits you want to recognized. I assume there are 10 digits. So there will be 10 elements in the vector. Each element is the value of probability of one digit. For example, one possible output could be y = [0.01, 0.98, 0.09, 0,0,0,0,0,0,0], and in this case, you are predicting it is the second digit. This is the output after softmax activation function. You can read more about softmax function here Softmax Activation Function — How It Actually Works  by Kiprono Elijah Koech  Towards Data Science
Hope it helps.
I have exactly the same thoughts when following the course.
In the course the parameters W and b are already given to the NN.
But how are they calculated for each unit ?
It is a bit blurry here ?
I understand the logicstic regression for one unit.
But how is that the parameter w and b can be different from one unit to another in the layer, given that they have the same input X…
I hope someone can help me here