How different initialization of centroids of K-means results in drastic different clusters ? They all share common cost function

Tanvi · October 29, 2022, 7:01am

In the lecture, we studied that there is local optima and global optima. Also, one initialization will lead to local while other can lead to global optima.

I do not understand how that happens. Is there a mathematical proof for that ?

Moaz_Elesawey · October 29, 2022, 7:17am

Hi, @Tanvi

It has to do with the data itself if your clusters have a well-defined boundary between them it will be confident that you will always find the same centroids and only will depend on the parameter k.

but if your data does not have clear boundaries between each cluster so initializing the centroid randomly will give different results depending on the initialization.

recall that you initialize the centroids randomly and keep updating them until they settle at a good point.

my suggestion is to use the random_state parameter with a single value to guarantee that you get the same centers each time you use your model.

from sklearn.cluster import KMeans
kmeans = KMeans(n_clusters=3, random_state=42)
...

rmwkwok · October 29, 2022, 7:28am

@Tanvi, is there any mechanism for us to drive away from a local optima?

Christian_Simonis · October 29, 2022, 9:46am

Hi there,

please bear in mind that a training process is in general not deterministic but rather stochastic, e.g. due to:

sampling from your training data (batches), resp. stochastic gradient descent
dropout (where randomly Neurons are „shut off“) for regularization
initialisation (if you sample from a distribution)
etc.

So in general in your training process there is some randomness.

Dependent on your actual problem and the mechanics behind your loss function, you might encounter several local minima. Here your solver can get „stuck“ due to the above described randomness [1]. If you do it several times you might end up in different optima.

Note: you can prevent this with choosing a pseudo-random seed to ensure repeatability [2]. (At least you will get the identical result from your optimization routine. But it’s not clear whether or not this is the global optimum. Frankly speaking, in reality this might not even be required as long as your model is sufficient in fulfilling your business needs and is robust enough [does not overfit].)

Useful links:

Best
Christian

Christian_Simonis · October 29, 2022, 9:58am

In addition, solvers like ADAM [Adaptive Moment Estimation] utilize „Momentum“ of the optimization trajectory to overcome local minima in order to get to the global optimum eventually.

Best
Christian

Moaz_Elesawey · October 29, 2022, 10:20am

Hi, Raymond

Recently I started taking a course titled Convex Optimization for Machine Learning where the convex here describes functions that only has a single minimum which implies that it has a Global Minima. in the lecture I learned that not all function is convex in nature but there is a way that you can convert a function which is not convex in nature from on set to another and in this new set this function can be considered a convex function. one of the methods is called Convex Hull which is defined as the smallest convex set that contains this function in it. if we could find such a Hull we then can convert our function to be convex and be able to find its global minimum easily.

rmwkwok · October 29, 2022, 1:25pm

Hello @Moaz_Elesawey,

Convex relaxation? Is the course able to give a simple / practical example that talks about , before and after applying the technique, what difference does it make? Can you share the example?

Raymond

Moaz_Elesawey · October 29, 2022, 1:42pm

Hi Raymond

For now, we only just scratch the surface of Convex Optimization all of what we take is just theoretical and very mathematical (which I am not very good at ).

If you want the materials and Note I have them in this GitHub Repo

I really don’t have any practical example in mind now, but the ultimate goal of this technique is will enable us to find the minima easier because as I said Convex means that it only has one minimum which means it’s the Global.

rmwkwok · October 30, 2022, 1:07am

Hello Moaz,

Thank you for sharing the repo with me. I read the notes and seems the best is yet to come. So far those chapters seem only for setting the stage. Lagrange multiplier is a very interesting stuff and I had learnt it when I studied Lagrangian Mechanics.

Raymond

Moaz_Elesawey · October 30, 2022, 6:18am

Hi, Raymond

You’re welcome,. You’re totally right we have not yet go very deep into the subject we barely touched it in the last lecture.

I hope it will be easy to study as mathematical foundation is not that great.

Best regards,
Moaz

rmwkwok · October 30, 2022, 7:02am

Moaz, let me know if there is any maths stuff in those chapters that you want to discuss.

Raymond

Moaz_Elesawey · October 30, 2022, 7:22am

Thank you very much Raymond, I really appreciate it. I will let you know if I need any help.

Best regards,
Moaz

rmwkwok · November 11, 2022, 12:40pm

Hello @Moaz_Elesawey,

I come across this video explaining Lagrange multiplier visually.

Raymond

Moaz_Elesawey · November 11, 2022, 2:19pm

Hi, @rmwkwok

I am very thankful to you and I really appreciate your help. I will certainly watch it.
Thank you again.

Sincerely,
Moaz

rmwkwok · November 28, 2022, 9:18am

Hello @Moaz_Elesawey,

Me again. I am recently reading about lagrange multiplier again and suddenly realized that regularization terms can just be an example of using lagrange multiplier. I found this discussion on google.

I think it’s nice to connect together 2 topics that we have learnt.

Cheers,
Raymond

Topic		Replies	Views
Course 3 week 1 : initializing K means vs choosing the right K Unsupervised Learning, Recommenders, Reinforcement week-module-1	3	479	April 12, 2023
C3_W1_Practice Quiz Unsupervised Learning, Recommenders, Reinforcement week-module-1	1	251	February 4, 2024
K-means question. if we start from the centriod as random selection. Will the results of each calculation always be the same or similar? or very different? Unsupervised Learning, Recommenders, Reinforcement week-module-1	1	301	October 26, 2023
C3_W1_KMeans_Assignment lab Unsupervised Learning, Recommenders, Reinforcement week-module-1	6	252	April 2, 2024
K-Means Clustering AI Discussions	2	146	January 21, 2023

How different initialization of centroids of K-means results in drastic different clusters ? They all share common cost function

Related topics