Kmeans cost function

NaughtyConstrictor · December 15, 2024, 9:32pm

I want to visualize the cost function of the kmeans problem, so that I can see how it has local minimas and how the algorithm might get stuck in it, but I’m still not sure how to do that.
I tried generating simple uniform 1-d data and calculate the cost for every 2 pairs of possible centers, but the resulting cost function looked convex to me (even though it had like 2 global minimums but that because the center got flipped I suppose since it is a symmetric function).
I guess if the data is uniform then the algorithm always converges? (makes sense in a way), maybe if I try it with only 2 centers but the data has like 3 centers in reality? like it is dense in 3 regions instead of only 2 maybe? or are there other scenarios that I can try?
Also, I suppose it is impossible to visualize the cost function for 2-d data? or there are tricks which can be done?

TMosh · December 15, 2024, 9:56pm

K-means doesn’t really have a ‘cost function’ because it isn’t a supervised method.

We’re just assigning examples to the nearest centroid using the least squares distances.

There are no local minima.

NaughtyConstrictor · December 16, 2024, 1:39pm

Just because it is not a supervised learning problem doesn’t mean that there is not cost function no? It uses " WCSS (within cluster sum of squares)", which is the objective function we are trying to minimize (I thought that objective and cost functions are the same but I maybe wrong). This is the functino that I would like to visualize.

And professor Ng said in the lecture “Initializing K-means” does say that the algorithm we use to optimize the objective function might get stuck in a local minimum. He did give some example with a 2-d dataset and k = 3 in around the third minute of the lecture, and it is easy to replicate such a scenario but I don’t know how I can visualize the function for a problem with such dimensions.

TMosh · December 16, 2024, 4:04pm

You cannot effectively visualize any function that has more than 3 dimensions.

NaughtyConstrictor · December 16, 2024, 5:33pm

That’s why I tried to visualize it for 2 centers only; with 1-d data, but I got a convex function for the sample data that I used

TMosh · December 16, 2024, 6:31pm

Convex functions are good, because they have no local minima.

Topic		Replies	Views
Practical questions about K-means cost function Unsupervised Learning, Recommenders, Reinforcement week-1	4	304	March 25, 2024
C3_W1_Practice Quiz Unsupervised Learning, Recommenders, Reinforcement week-1	1	250	February 4, 2024
Explain Visualizing cost function Supervised ML: Regression and Classification week-1	16	724	July 19, 2023
K-means random initialization Unsupervised Learning, Recommenders, Reinforcement week-1	2	33	August 16, 2024
Gradient descent C1_W1 Supervised ML: Regression and Classification week-1	2	513	July 31, 2022

Kmeans cost function

Related topics