Sorry, sort of simple question, but suddenly wondering if I have been misunderstanding all along.

In a ‘dense’ layer of the network, obviously the weights are adjusted by the activations of the ones before it-- But we still have biases which are adjusted at the ‘per layer’ level, right ?

Or where did those disappear to-- I kind of thought that was the whole purpose of the biases ?

*He he… Maybe this picture from the lecture also has a node connected… But ‘isn’t’… Or I didn’t think we did dropout on the *data* ?

I’m not familiar with the material in NLP C3, but note that there are two ways to include the bias:

- You can have an explicit bias term, as is done in all the DLS courses.
- You can include it as an extra “weight” with a feature value of 1. Given the way that x_0 is faded in that image, I’m guessing maybe they are doing it this way here. Notice that you end up with n + 1 weight values in that formulation.

2 Likes

@paulinpaloalto thanks that is possible; The language between the courses is not ‘100%’ the same. Good for learning, but sometimes confusing.

I will look into it.

Actually now that I think \epsilon more about it, I am familiar with the material in NLP C1 and remember that they did exactly what I described in option 2) above in the very first assignment in C1 W1 Logistic Regression. The weights are the vector \theta and there are n + 1 elements, where n is the number of input features. You can see they always set x_0 to 1.

So maybe they’re actually just being consistent here. Stranger things have happened!