I am at this point in the course 1. [Advanced Learning Algorithms] > [Week 1] > [Inference in Code] and not sure if this will be answered in future videos but I want to ask anyways.
Professor Ng was explaining this in the video for lecture " Inference in Code" starting at min 4.02. He explained based on the input (layer 0) which is [[200.0, 17.0]] layer 1 produces an output such [0.2, 0.7, 0.3] as layer 1 had 3 units. I understand that each neuron will have different “initial” weights (w,b) what I dont understand is why do I need 3 neurons each in layer 1 performing a logistic regression? Wouldnt just one neuron be enough because if we use gradient descent initial weights dont matter and gradient descent will find the correct weights with least cost. So …
Why have 3 neurons doing the same thing trying to find a correct weights?
Do we really use gradient descent at all in neural networks if not how does each neuron find the correct w?
If each neuron is doing logistic regression, why do we need neural networks is the only reason is to scale logistic regression algorithms?
It is true that Gradient Descent attempts to find the lowest cost, but there is a chance that the lowest cost with 1 neuron is higher than the lowest cost with 3 neurons.
Yes, we use Gradient descent for updating all weights in all neurons. We first compute the gradients in the output layer (layer-2), and those gradients are proportional to the error, then propagate that error information to hidden layers (layer-1). We only have labels for the output layer, but we propagate the error backward. This topic is called “back propagation” and will be briefly discussed later in the course. More discussion and examples are left to more advanced courses.
Logistic regression is a term for this formalism y =\sigma(\vec{w}\cdot\vec{x} + b) under the logistic loss function. While it can be understood to call a 1-layer neural network with sigmoid as activation and the logistic loss function as a neural network equvialence of the logistic regression, it would be better to call the network in that video as a neural network for binary classification problem.
With that in mind, I would not say “each neuron is doing logistic regression”. Instead, the whole network does binary classification.
Recognizing the neural network as a whole is important.
Neurons in the layer-1 is NOT doing logistic regression, instead the 3 neurons there transform the input in 3 different ways in the hope of that such diversity will have a better chance at coming up with some good transformations. The output layer-2 neuron then takes those transformed inputs to make the final prediction.
Hmm I thought it always found out the lowest cost no matter what, not sure I understand what parameters could affect for not finding the lowest cost. Atleast that is what I understood from course 1
Does this only happen in backward propagation? What happens in forward prop then? Are forward and backward used together?
Not sure I understand this completely, I think you are trying to say is that each neuron does not necessarily do logistic regression, it kind of does something similar to that and does much more than that, is that right? So far based on the course I am unable to understand and it is not also explained clearly what is really calculated in so many layers and neurons. I wish there were a real life example maybe it is the coffee roasting which really explains what happens at each layer within each neuron, what is the exact output of each and input of each neuron/layer and how the final prediction is calculated. All the explanations so far focuses on assuming a certain input and output in a very vague way.
Yes, but what does each neuron do or add to the whole process? A neuron by neuron explanation with an example would help.
This is exactly what I dont get, what are these individual transformations and in what ways apart from the initial weights being different? Maybe they operate on different features but what tells them to use a specific set or features. I think im confused.
No. It is not the case that we find the exact SAME lowest cost regardless of the network’s architecture, and you will see that when you start to experiment your dataset with different architectures.
Since a larger architecture is more expensive to train and to use, it must count for something right? And that is the performance (or the cost).
It is not absolutely correct that the larger the architecture the better the performance, but there are some relations. The relation is something you need to learn from experiments and those lectures.
In course 2 there will be lectures about how we adjust the architecture to get better performance, and there will be assignments which use Tensorflow to build some architectures. My suggestion for you is to finish the course 2 first which includes passing all the assignments, and after that, you might adjust the architecture to see it for yourself how it will sometimes improve performance and it sometimes will degrade it.
If you ever have any questions about the relation between architecture and performance, answer them yourself with experiments.
Forward and backward propgations are two processes that “forward” comes first, and then “backward” comes next. That “error propagation” which I have described happened only in the backward propagation process.
Since the forward prop and the back props are the two processes which the MLS course 1 & 2 and the Deep Learning Specialization Course 1 are about, I will just let you learn them through the courses. It will take some time for you to complete them, but let’s be patient since the courses are designed for that.
What I wanted to say is, don’t mix up logistic regression with a neural network’s neurons. No neuron in a hidden layer does logistic regression. The output layer’s neuron may be seen as doing it, but just the output layer.
As I said in above, I will let you learn it through the courses and assignments.
Please be patient and learn it through the courses and assignments. One suggestion is to free up the mind from thinking a neuron as logistic regression. Leave that open to any other ideas that can come up in the process of learning.
It is normal for any learner to not grasp the whole idea with just the time window taken for course 1 and the first week of course 2 (since you post this thread in Course 2 Week 1). Please be patient and let you learn it from the courses.
You will see in the courses that each layer is just some mathematical operations which transform the layer’s input into some output, and then those output serves as the input for the next layer which again transform it into some other output.
I think to begin with, this is the least we will get after finishing the courses I have mentioned (the MLS course 1 & 2 and DLS course 1).
@ronnyfrano, I can see your passion to get everything understood in a way that will make you feel confident, but it takes time to think and learn through the MLS courses and the DLS course 1 to build up some foundations.
I recommend you to go through the whole process and try yourself to come up with some theories that can connect all the dots up, and then by that time, we can discuss you understanding.
I want to emphasize once again that, it is important for you to go through the courses (MLS C1 & C2, and DLS C1), think through the materials, and come up with your understanding that can connect all the dots up.
This is a learning process, and this process takes time.
The questions that you are having is very normal for any learner, but those are also questions which required your time and effort to think and learn.
I understand why you have asked the questions, but I will also need you to understand that I need to refer you back to the courses so that you can learn the answer of the question yourself.
As I have repeated multiple times, don’t presume what a neuron does before continuing with the courses, instead open yourself to whatever delievered by the course.
The less we presume, the more room we are allowing ourselves to accept new views.
The more we presume, the more likely we would be looking for evidence to prove our presumption but those evidence might just not exist. This can end up in confusion.
There is a reason why people call neural network a black box, so to begin with, don’t try to come to a conclusion so soon such as “it is doing a logistic regression”, instead, try to accept the mathematical nature of neural network. The maths nature is boring and plain and sounds not insightful, but it is also the foundation. It is the foundation and a way of thinking neural network with least presumption.
I saw you on another thread about how to be a ML expert, so after you finish the course, and if you will still be interested in the same question of what a neuron does, let’s talk about this again. At that time, if you won’t mind, instead of just asking questions, share your view about what a neuron does based on your understanding of the maths. Even though not everyone feels necessary to understand it that way, I personally think that’s going to be helpful and I guess many experts can explain it that way.
There will be no need to defend your stance in this or any previous question. We can let go of it unless you still have the same stance and you can justify it to some degree based on what you will have learnt. As a sign of a new attempt, you may start a new thread if you want.
Instead of questions, I want to read how you will finally understand it. There is no time limit - you may post it after finishing the course (and perhaps reading some additional materials elsewhere), or you may post it ten years later while in the mean time continue to discuss any other issues of your interest in this community and that’s all fine.
It is just that if you post it too late that I can’t read it, I believe you will still receive a lot of useful feedbacks from the others. However, all of the above are my suggestions, and you do not have to do anything I suggested.
Wow @rmwkwok , you are so passionate about helping others. I have never seen someone been so keen on replying to community post. You are awesome. If there is a place to provide official feedback for you let me know, I will be happy to.
I’d be happy to put up a separate thread after the course completion.
Hi @ronnyfrano, thanks. Your response already meant everything. When it comes to concept, I always think there is no one answer even to the same question when it is asked by different person. Let’s see how we will work that out.
I’m having the same questions as you, at the same point in the course. What are the neurons actually doing? how are they adding up the multiple inputs and outputting a single value? How is the training happening? In the previous course, the summing happens during the regression part, right? But there isn’t regression here yet, so how is it summing? I feel like I’m missing something.