Non-Convexity of Neural Network

farhana_hossain · December 15, 2023, 11:21am

Is the reason stated behind the reason for the non-convexity of the neural network correct?
If I break down the final activation of the final layer, then it is possible to obtain the multiplication in between two coefficients, or weights (w). So, the reason seems satisfactory or valid to me: slight_smile:

Nydia · December 15, 2023, 1:14pm

Based on the plot, it seems correct, there is at least one local and one global minima. While x^2 and y^2 are trivially convex since they have only 1 minima therefore a global minima.

farhana_hossain · December 15, 2023, 1:17pm

Is it correct ? @nydia

Nydia · December 15, 2023, 1:24pm

Well, it is one of the main reasons but I would say not the only one.

farhana_hossain · December 15, 2023, 4:05pm

What could be the other reasons?

farhana_hossain · December 15, 2023, 4:07pm

@rmwkwok I want your response

TMosh · December 15, 2023, 7:24pm

Note, this thread is a duplicate of another forum discussion.

rmwkwok · December 16, 2023, 1:16am

Hi @farhana_hossain,

To begin with, the 3D graph is neither f(x, y) = xy nor f(x, y) = x^2y^2. f(x, y) = xy should be a plane. f(x, y) = x^2y^2 never spans over its negative range.

Please replace it with the right ones. If you want to explain something, the content is better to be coherent - the graph is the graph of the function you are discussing about.

Then, read this Wiki for the definition of a convex function. A practical test is to see if its second derivative is positive. Take the second derivative of f(x, y) = xy and f(x, y) = x^2y^2, and tell us (i) whether they are always positive, and, if not, (ii) when they are positive.

Cheers,
Raymond

rmwkwok · December 16, 2023, 1:40am

Happen to find this on Google. May need it later.

farhana_hossain · December 16, 2023, 1:48am

No, for the first one, we get =0=neutral, and for the second one, we get 2y^2=positive if y is not 0.

rmwkwok · December 16, 2023, 1:55am

Maybe you can then update your explanation by incoporating some correct plots and new findings so far? We can base the discussion on the updated version?

I want you to watch this Andrew’s lecture and take it into consideration. Think about this: linear activation makes multiple layers equivalent to one layer. Would it in any way contradict to anything in your current explanation?

No rush on this. Don’t rush. Don’t rush. It is not always interesting to see a rush job - not when it is learning.

Raymond

farhana_hossain · December 16, 2023, 2:01am

Thanks, Raymond! I will return to this discussion soon.

rmwkwok · December 16, 2023, 2:14am

Don’t be too soon. Make it interesting.

Cheers

farhana_hossain · December 16, 2023, 2:18am

Yes, it does contradict when there are all linear activations in hidden layers.
If there are all linear activations in hidden layers, then it would be convex because it is not a real neural network.
But I thought the multiplication between 2 weights (w) in other cases, except for linear activations in a neural network ~which is not a real neural network, might lead to non-convexity.

rmwkwok · December 16, 2023, 2:21am

So you admit that a non-linear activation is needed. Right?

@farhana_hossain, rewrite your explanation, okay? Collect your thoughts, organize them, and rewrite your explanation, okay? Then we discuss your latest version of explanation, with the right plots? Okay?

I won’t return to this thread in the next 24 hours.

Topic		Replies	Views
Cost function convex why gradient decent Supervised ML: Regression and Classification week-1	5	597	June 13, 2023
Local Optima with Gradient Descent Improving Deep Neural Networks: Hyperparameter tun coursera-platform	1	552	May 30, 2021
Neural Network Clarification Advanced Learning Algorithms week-2	2	22	January 3, 2025
I asked ChatGPT about the look of the "weight landscape" and it gave good pointers Improving Deep Neural Networks: Hyperparameter tun week-2 , coursera-platform	2	31	March 3, 2025
Understanding of local optima in deep networks Improving Deep Neural Networks: Hyperparameter tun coursera-platform	4	600	April 28, 2023

Non-Convexity of Neural Network

Related topics