Is the reason stated behind the reason for the non-convexity of the neural network correct?

If I break down the final activation of the final layer, then it is possible to obtain the multiplication in between two coefficients, or weights (w). So, the reason seems satisfactory or valid to me: slight_smile:

Based on the plot, it seems correct, there is at least one local and one global minima. While x^2 and y^2 are trivially convex since they have only 1 minima therefore a global minima.

Well, it is one of the main reasons but I would say not the only one.

What could be the other reasons?

Note, this thread is a duplicate of another forum discussion.

Hi @farhana_hossain,

To begin with, the 3D graph is neither f(x, y) = xy nor f(x, y) = x^2y^2. f(x, y) = xy should be a plane. f(x, y) = x^2y^2 never spans over its negative range.

Please replace it with the right ones. If you want to explain something, the content is better to be coherent - the graph is the graph of the function you are discussing about.

Then, read this Wiki for the definition of a convex function. A practical test is to see if its second derivative is positive. Take the second derivative of f(x, y) = xy and f(x, y) = x^2y^2, and tell us (i) whether they are * always* positive, and, if not, (ii)

*they are positive.*

**when**Cheers,

Raymond

No, for the first one, we get =0=neutral, and for the second one, we get 2y^2=positive if y is not 0.

Maybe you can then update your explanation by incoporating some correct plots and new findings so far? We can base the discussion on the updated version?

I want you to watch this Andrew’s lecture and take it into consideration. Think about this: linear activation makes multiple layers equivalent to one layer. Would it **in any way** contradict to anything in your current explanation?

No rush on this. Don’t rush. Don’t rush. It is not always interesting to see a rush job - not when it is learning.

Raymond

Thanks, Raymond! I will return to this discussion soon.

Don’t be too soon. Make it interesting.

Cheers

Yes, it does contradict when there are all linear activations in hidden layers.

If there are all linear activations in hidden layers, then it would be convex because it is not a real neural network.

But I thought the multiplication between 2 weights (w) in other cases, except for **linear activations in a neural network ~which is not a real neural network,** might lead to non-convexity.

So you admit that a non-linear activation is needed. Right?

@farhana_hossain, rewrite your explanation, okay? Collect your thoughts, organize them, and rewrite your explanation, okay? Then we discuss your latest version of explanation, with the right plots? Okay?

I won’t return to this thread in the next 24 hours.