How different initialization of centroids of K-means results in drastic different clusters ? They all share common cost function

Hi there,

please bear in mind that a training process is in general not deterministic but rather stochastic, e.g. due to:

  • sampling from your training data (batches), resp. stochastic gradient descent
  • dropout (where randomly Neurons are „shut off“) for regularization
  • initialisation (if you sample from a distribution)
  • etc.

So in general in your training process there is some randomness.

Dependent on your actual problem and the mechanics behind your loss function, you might encounter several local minima. Here your solver can get „stuck“ due to the above described randomness [1]. If you do it several times you might end up in different optima.

Note: you can prevent this with choosing a pseudo-random seed to ensure repeatability [2]. (At least you will get the identical result from your optimization routine. But it’s not clear whether or not this is the global optimum. Frankly speaking, in reality this might not even be required as long as your model is sufficient in fulfilling your business needs and is robust enough [does not overfit].)

Useful links:

Best
Christian