Non-Convexity of Neural Network

Is the reason stated behind the reason for the non-convexity of the neural network correct?
If I break down the final activation of the final layer, then it is possible to obtain the multiplication in between two coefficients, or weights (w). So, the reason seems satisfactory or valid to me: slight_smile:

1 Like

Based on the plot, it seems correct, there is at least one local and one global minima. While x^2 and y^2 are trivially convex since they have only 1 minima therefore a global minima.

1 Like

Is it correct :thinking:? @nydia

1 Like

Well, it is one of the main reasons but I would say not the only one.

1 Like

What could be the other reasons?

1 Like

@rmwkwok I want your response

1 Like

Note, this thread is a duplicate of another forum discussion.

Hi @farhana_hossain,

To begin with, the 3D graph is neither f(x, y) = xy nor f(x, y) = x^2y^2. f(x, y) = xy should be a plane. f(x, y) = x^2y^2 never spans over its negative range.

Please replace it with the right ones. If you want to explain something, the content is better to be coherent - the graph is the graph of the function you are discussing about.

Then, read this Wiki for the definition of a convex function. A practical test is to see if its second derivative is positive. Take the second derivative of f(x, y) = xy and f(x, y) = x^2y^2, and tell us (i) whether they are always positive, and, if not, (ii) when they are positive.

Cheers,
Raymond

Happen to find this on Google. May need it later.

1 Like

No, for the first one, we get =0=neutral, and for the second one, we get 2y^2=positive if y is not 0.

1 Like

Maybe you can then update your explanation by incoporating some correct plots and new findings so far? We can base the discussion on the updated version?

I want you to watch this Andrew’s lecture and take it into consideration. Think about this: linear activation makes multiple layers equivalent to one layer. Would it in any way contradict to anything in your current explanation?

No rush on this. Don’t rush. Don’t rush. It is not always interesting to see a rush job - not when it is learning.

Raymond

2 Likes

Thanks, Raymond! I will return to this discussion soon.

1 Like

Don’t be too soon. Make it interesting.

Cheers :wink:

1 Like

Yes, it does contradict when there are all linear activations in hidden layers.
If there are all linear activations in hidden layers, then it would be convex because it is not a real neural network.
But I thought the multiplication between 2 weights (w) in other cases, except for linear activations in a neural network ~which is not a real neural network, might lead to non-convexity.

1 Like

So you admit that a non-linear activation is needed. Right?

@farhana_hossain, rewrite your explanation, okay? Collect your thoughts, organize them, and rewrite your explanation, okay? Then we discuss your latest version of explanation, with the right plots? Okay?

I won’t return to this thread in the next 24 hours.

1 Like