Hello everyone,

Again with a probably very basic question

Each ‘neuron’ is divided in two parts: a 1st part with applies Z = W.X + b (the linear function) and a 2nd part, which applies Sigmoid of Z (the activation function) to deliver a 0 or 1 output. Trying to understand the importance of each part (and please correct me if I’m mistaken), I come to the conclusion that if we did not have the linear function, we had not way of improving our algorithm because we had no parameters to be updated (there was no need for the cost function in this case); and if we had no activation function, we had no way of passing the output of Layer 1 to be an input of Layer 2, so we would be cutting the communication between Layers, effectively killing the dynamic of Forward and Back Propagation.

Regarding the linear function, what is specifically the purpose of the bias term, besides being a parameter that can also be updated? Searching a little bit about it, I see that it’s purpose is, among others, to generate an output of the linear function even when the input (X) is zero. But if the input X is zero, why would I want this Z (which would be a constant, since W.X would be = 0) to be passed on to the activation function and along the other Layers? I also read that it is important to have it as an updated parameter to improve the model’s accuracy; but here I’m tempted to ask why not then add a third and fourth parameters just to have them updated as well (effectively creating a new linear function, Z = W.X + b [- xyz, …] ) and improve the general model accuracy?

The role of W is a little more clear to me, as it may allow us to increase the strength (weight) of certain X features, but still can’t clearly understand the real important role of b (and what would we miss if the linear function would simply be Z = W.X).