Week 2 - Why do we need activation functions? A dilemma with respect to polynomial features

Referring to : https://www.coursera.org/learn/advanced-learning-algorithms/lecture/mb9bw/why-do-we-need-activation-functions

As well as

Question

If we feature scale and use polynomial features, won’t linear activation functions perform polynomial regression?
Wouldn’t this invalidate the idea that linear activation functions in multiple neurons is just equivalent to performing linear regression once?

A linear activation function used in many neurons is equivalent to just doing linear regression once - I understand that

Where i am confused is, what if we feature engineer to use ploynomial features? Won’t the linear activation functions perform polynomial regression then (please see image 2 from Course 1 week 2 of the ML Specilaization)

Images

Please see the 2 images below:

Yes.

Sorry, I don’t understand what you’re asking here. What are “multiple neurons” and “perform linear regression once”?

1 Like

Thank you for answering the first question.

What I meant to ask is:

  1. In the lecture we see that linear activation function in multiple layers is as good as using linear regression.

  2. So my question is, if we use polynomial features, x, x^2, x^3 - will having multiple layers with a linear activation function still be as good as linear regression?

As an example, in the NN architecture below, what if we were to use polynomial features as well?

Please let me know if this makes sense.

The part that is confusing me is:

  1. Linear Activation Functions in all layers is pointless - That’s correct
  2. However, if we were to use polynomial features, then what’s the effect of using a linear activation function then? Does this lead to non-linearity since the features are polynomial and we are okay? Or do we still run into the same outcome where we should just use polynomial regression rather than using a neural network all together

It’s not “as good as”. It’s exactly the same thing.

I don’t completely understand what you’re asking.

Maybe this gets close: If you use a NN with a non-linear function in the hidden layer, then you don’t need to engineer your own polynomial features.

This does get close and answers another question I had.

Sorry about not being able to form the question well. I’ll give another try below. Thanks for the patience :slight_smile:

I’m seeking further clarification on a specific aspect discussed during the lecture, particularly related to the use of linear activation functions in neural networks.

The lecture highlighted that employing a linear activation function across multiple layers essentially yields the same effect as conducting linear regression. This leads me to ponder about the scenario where polynomial features (such as x, x^2, x^3, etc.) are incorporated into the model. Specifically, my question is:

  • If we integrate polynomial features within a neural network architecture that employs linear activation functions across its layers, does this approach still equate to performing linear regression? Or does the inclusion of polynomial features introduce a level of non-linearity that makes this configuration more advantageous than mere linear regression?

To illustrate, consider a neural network architecture as follows, but with an added twist of incorporating polynomial features.

The core of my confusion lies in understanding the impact of linear activation functions when used in conjunction with polynomial features:

  1. Is using linear activation functions in all layers still considered unnecessary if we include polynomial features?
  2. Does the use of polynomial features with linear activation functions introduce any non-linearity to the model, thereby justifying the neural network’s architecture over traditional polynomial regression?

I appreciate your insights on this matter, as it’s a point of confusion that I’m eager to resolve.

Hello @adishri,

Thank you for elaborating your questions!

I will put it this way:

A: original features + multiple hidden layers with “linear” activation + an output layer
B: original features + an output layer
C: polynomial features + multiple hidden layers with “linear” activation + an output layer
D: polynomial features + an output layer

Here:

  • A & B are equivalent, and they are both linear regressions of the input features (which are the original features)

  • C & D are equivalent, and they are both linear regressions of the input features (which are some polynomial features)

  • In A, B, C, and D, their outputs are linear with respect to their inputs, so they are all linear regressions.

  • The polynomial features bear some non-linearity with respect to the original features. The features bear the non-linearity, NOT the neural networks.

  • Therefore, with respect to the original features, C & D carry some non-linearity which is brought NOT by the neural networks BUT by the feature engineering process.

Yes, that neural network is still a linear regression with respect to its input (which is a set of some polynomial features).

First, I think you were implying that, with polynomial features, it is not called linear regression. This is wrong. This is still a linear regression because our model is always only establishing a linear relationship between the inputs and the output, and it does not care whether the inputs are non-linear to something else.

So, as said, both with the polynomial features and with the original features, they are linear regression. All of the A, B, C, and D are linear regression.

However, using some polynomial features DO bring some non-linearity in, as compared to the original feature, but as for whether it is benefitical to do so, it remains to be tested.

Cheers,
Raymond

2 Likes

Indeed, Raymond.

“Linear regression” does not mean that the predictions are limited to a straight line of constant slope.

2 Likes

@rmwkwok Thank you so much for explaining this. It really clears out my doubts.

Knowing that A&B are equivalent and so are C&D really made me understand that the linear activation function across all layers of a neural network is the same as just performing linear regression.

I see. For some reason i assumed that if we use polynomial features it’s “polynomial regression” and not linear regression.

Understood. Even if the features are polynomial - the model is still linear - because the model is working on establishing a linear relationship between the inputs and the outputs.
I am understanding that what matters here is that f(x) = w * x +b ; and x can be anything but that doesn’t eliminate the fact that the model is performing linear regression

True that. I think that will be more related to the problem one is trying to solve!

@TMosh that hits home and clears a lot of my misunderstandings as well.

Thank you so much guys :slight_smile: The help means a lot!

“Polynomial regression” is somewhat of a misleading name.
It’s better as “Linear regression with polynomial features”.

“Linear” refers to the linear combination of the weights and features.

1 Like

Hello @adishri,

I am glad that I could give you a different perspective!

Well, but as explained in wikipedia , polynoimal regression is also “considered to be a special case of multiple linear regression.” :wink:

Cheers,
Raymond

1 Like

This was an extremely helpful insight for me that I didn’t realize. For anyone else that comes across this, the second answer here was also helpful Why is polynomial regression considered a special case of multiple linear regression? - Cross Validated