Sorry, sort of simple question, but suddenly wondering if I have been misunderstanding all along.
In a ‘dense’ layer of the network, obviously the weights are adjusted by the activations of the ones before it-- But we still have biases which are adjusted at the ‘per layer’ level, right ?
Or where did those disappear to-- I kind of thought that was the whole purpose of the biases ?
*He he… Maybe this picture from the lecture also has a node connected… But ‘isn’t’… Or I didn’t think we did dropout on the data ?
I’m not familiar with the material in NLP C3, but note that there are two ways to include the bias:
- You can have an explicit bias term, as is done in all the DLS courses.
- You can include it as an extra “weight” with a feature value of 1. Given the way that x_0 is faded in that image, I’m guessing maybe they are doing it this way here. Notice that you end up with n + 1 weight values in that formulation.
2 Likes
@paulinpaloalto thanks that is possible; The language between the courses is not ‘100%’ the same. Good for learning, but sometimes confusing.
I will look into it.
Actually now that I think \epsilon more about it, I am familiar with the material in NLP C1 and remember that they did exactly what I described in option 2) above in the very first assignment in C1 W1 Logistic Regression. The weights are the vector \theta and there are n + 1 elements, where n is the number of input features. You can see they always set x_0 to 1.
So maybe they’re actually just being consistent here. Stranger things have happened!