Why bias term is needed?

tbhaxor · January 10, 2024, 7:34pm

So I know that bias is needed to shift the line of best fit along axis (in some case its x and other cases y axis), but why? Why can’t we just shift the origin such that the line always passes from the origin?

paulinpaloalto · January 10, 2024, 8:28pm

What does it mean to “shift the origin” from a mathematical p.o.v.? I have not taken this course, so I’m don’t specifically know what the problem is that is being solved here, but if it is using Machine Learning, then bear in mind that you don’t know what the solution is a priori, right? So how do you figure out what the right amount is by which to “shift the origin” to achieve the elimination of the bias? How does that simplify things relative to just including the bias term and letting the algorithm learn the value along with all the other weights (parameters) that are being learned?

TMosh · January 10, 2024, 8:45pm

Mathematically, that’s the same thing as adding a bias.

tbhaxor · January 12, 2024, 6:18pm

Yes I have read this on a blog, but couldn’t comprehend. Would like to help me?

I know why they are variable, just a means to change it based on learning.

paulinpaloalto · January 12, 2024, 6:34pm

Sure, but I think Tom and I answered the question, didn’t we? If you don’t include bias, then you are putting a very severe artificial restriction on the solutions that you can find. It’s exactly analogous to saying you will only accept lines through the origin. Yes, you’re right that this is logically equivalent to applying two transformations in order: shift the origin by the bias amount and then do the “rotate and scale” linear transformation. But the problem is that the bias is learned, right? You don’t know what it is until after you run the training. So Prof Ng has given you a way to learn both transformations in one step. You want to make it two steps, so it’s your job to show us how you could do that and why that is somehow better than Prof Ng’s method.

TMosh · January 12, 2024, 6:37pm

Not sure if this will help.

In an earlier version of ML courses, Andrew taught that the ‘bias’ value was simply considered to be another weight.

So a straight line would have two weights - one for the slope and one for the y-offset. This vector of weights was called ‘theta’.

Mathematically this is implemented using feature values of [1, x] for every example. So the predictions in that form are h = X * theta.

This lets you use a single dot product to compute the entire model, instead of what these DLAI courses do; (f_wb = w*x + b), which is a dot product for the weights and then adding the bias separately.

Mathematically the two methods are identical.

tbhaxor · January 12, 2024, 8:42pm

Transformations! Yes, it is the operation that changes one basis to another.

Matmul is nothing but a transformation. Finally I got the answer. And @TMosh, this is why we use bias a weight for constant input (1). Otherwise it would look like we are changing the origin, and that would violate the rule for what we call Linear Transformation.

Somehow it all makes sense now.

paulinpaloalto · January 12, 2024, 10:44pm

The mathematical term for what we are doing here is an Affine Transformation, which is a linear transformation plus the bias.

tbhaxor · January 20, 2024, 10:16pm

Ah, this term again came.

First I heard this while learning difference between vector space and affine space. What I know is affine is related to some origin stuff. So now you mentioning it makes more sense to me.

Transformation, because it is changing something, and linear because max power of is 1.

tbhaxor · January 20, 2024, 10:19pm

Found something interesting and worth sharing

https://personal.math.ubc.ca/~cass/courses/m309-03a/a1/olafson/affine_fuctions.htm

Topic		Replies	Views
Convolutional Model Filter Bias Convolutional Neural Networks	3	554	June 26, 2021
The need for a bias term AI Discussions	1	54	May 17, 2021
Purpose of Bias? NLP with Classification and Vector Spaces week-1	11	617	December 1, 2022
Logstic regression bias NLP with Classification and Vector Spaces week-1	6	312	January 4, 2024
Course 2 Week 3 Removing the Bias term in Batch Norm Improving Deep Neural Networks: Hyperparameter tun	1	386	August 8, 2023

Why bias term is needed?

Related topics