In logistic regression, why not make ‘b’ a vector? Why would each feature share the same b value, when they could be different?

Welcome, @Kukuquack. Consider any given X and w. Choose an example from X, say x^{(i)}. Now you are given the freedom to choose a value for b (b^{(i)}?) to minimize the “distance” between y^{(i)} (the target) and \sigma(w^T x^{(i)} + b^{(i)}). Which value of $b^{(i)}} would you choose? Once you answer that, then consider the predictive value of your model. Zilch, right? You have just created the perfectly overfit model. In statistics parlance, your model is not “identified.” That is, for *any* w, there is a “b-vector” (as you conceived it), that creates a perfect fit. Full marks for thinking out of the box!

1 Like

Thank you kenb for your answer