Flat points : are those completely flat?

farhana_hossain · December 13, 2023, 10:01pm

This is from choosing the activation function chapter.
The minimum point of the cost function of linear regression was completely flat, so the derivative of J was 0 at that point.
But I think the points in the image are not fully flat. if they were, then the gradient descent algorithm would stop running after reaching the derivative of J =0, but Prof. Andrew is marking those points, indicating that gradient descent keeps running.

Please correct me if you find anything wrong with my concept.

TMosh · December 13, 2023, 10:13pm

It’s just a sketch. A real plot of cost vs w would have a continual (though varying) slope down to the minimum cost.

farhana_hossain · December 13, 2023, 10:22pm

Does the graph represent the MSE cost function or the loss cost function?

TMosh · December 13, 2023, 10:55pm

For the subject of minimization, it doesn’t really matter. The curve shown could be either for classification or regression. Both will have similar shapes.

This is discussed starting at 0:52 in that video.

If the output activation is sigmoid, then it’s the logistic cost function.
If the output activation is linear, then it’s MSE.

farhana_hossain · December 14, 2023, 1:46am

Please clear my confusion.

Does the cost function graph for the sigmoid represent the cost function with the logarithm loss function ( L(f(x),y) ) in this slide?

choosing the activation function chapter.

TMosh · December 14, 2023, 2:28am

It isn’t specified, because it doesn’t matter. Both types of cost functions will have a similar shape.

farhana_hossain · December 14, 2023, 2:38am

the cost function with logarithm ensures the cost function is convex and therefore ensure convergence to the global minimum. But the graph isn’t representing the global minimum

TMosh · December 14, 2023, 4:34am

In a neural network that has a hidden layer, you don’t necessarily get a global minimum.

TMosh · December 14, 2023, 4:40am

This is because the cost function of a neural network with hidden layers is not convex.

farhana_hossain · December 14, 2023, 8:45pm

I figure out the scene in my imagination; why is it not convex?
If there are 2 neurons in layer 1 and 3 inputs (x0,x1,x2) in layer 0, then each w0 and w1 of the 2 neurons will be 1x3-dimension. So, the cost function of the output of layer 1 will have a 3x2 dimension of w (2 for each column and 3 for each row of w0 and w1). So, it will be complicated, making the cost function non-convex.

Let me know if I am still wrong.

TMosh · December 14, 2023, 9:21pm

The issue is not the number of units in each layer.

What makes the NN cost function not convex is that the hidden layer has a non-linear function (such as ReLU, sigmoid, or tanh).

farhana_hossain · December 15, 2023, 1:10am

Hi Tom,
Is it the correct explanation?

#Nonconvexity #NeuralNetwork

TMosh · December 15, 2023, 1:29am

I don’t agree with that description, because their example (multiplying parameters) is not how a NN works.

farhana_hossain · December 15, 2023, 3:03am

Oh ! I thought it was correct since I could find that
The breakdown equation of the final layer’s activation function has the multiplication of the final layer’s w with the w of the previous layers.

TMosh · December 15, 2023, 3:29am

No, it’s the sum of the products of each layer’s (weight * activation).

So it’s more like (w1*a1)+(w2 * a2)

(w1*w2) never appears there.

A real proof of the NN cost function not being convex requires quite a bit of calculus.

farhana_hossain · December 15, 2023, 9:37am

What if we break down a1 and a2? I actually did that, and so I found multiplication in between two w.

TMosh · December 15, 2023, 7:17pm

Yes, that is true. My previous reply was in error.

farhana_hossain · December 15, 2023, 8:45pm

Thank you Tom, I wanted to hear it.

TMosh · December 15, 2023, 8:52pm

Thanks for pointing out my mistake. I appreciate the opportunity to improve.

farhana_hossain · December 15, 2023, 8:59pm

It happens, and it gives me hope that experts can make minor errors too

Topic		Replies	Views
Cost function - How can we make sure that we end up in the global minimum and not one of the local minima Supervised ML: Regression and Classification week-2	2	831	December 3, 2022
Gradient Descent Doubt AI Discussions	7	115	July 11, 2022
Cost function shape in neural network Improving Deep Neural Networks: Hyperparameter tun coursera-platform	2	803	November 15, 2022
Gradient descent C1_W1 Supervised ML: Regression and Classification week-1	2	514	July 31, 2022
Regarding Gradient Descent Function Supervised ML: Regression and Classification week-1	6	507	January 24, 2023

Flat points : are those completely flat?

Related topics