Initializing Cluster centroids for 3D arrays

Basit_Kareem · January 14, 2023, 5:50pm

In the lab test for K-means clustering, the code imported an image which we experimented our trained model on. The image is a 128 x 128 x 3 matrix. However, the code reshaped the dimension to 128*128 x 3 making it a 2D array. It is with this 2D array that we randomly initialize cluster centroids and complete the algorithm.

However, if I don’t want to reshape the matrix from 3D to 2D, how do I initialize the centroids? Do I just select a random number of points from the 3D matrix just as I would do with 2D matrix?

To illustrate:

Imagine I have a 720 x 1280 x 3 matrix of an image which I am not willing to change to a 2D matrix, assuming that I want to use K = 4 cluster centroids,

Do I just select 4 random points from the 720 examples meaning each centroid would have a dimension of 1280 x 3?

Or

I will have to first select 4 random points then for each selected point which would have 1280 x 3 elements, I will select 1 example from the 1280 leaving me with a centroid of 1 x 3 vector? I will then repeat the selection for the other 3 of the initial 4 random points?

Which implementation is right?

Or to put it better, what would the dimension of the cluster centroids of 720 x 1280 x 3 matrix be

Muhammad_John_Abbas · January 14, 2023, 8:46pm

Hi @Basit_Kareem

In K-means clustering, the cluster centroids are typically represented as points in the same feature space as the data points that you are trying to cluster. In the case of an image, each pixel can be thought of as a data point with three features (red, green, and blue values).

In the case of a 720 x 1280 x 3 matrix, you could select 4 random points from the 720 x 1280 x 3 matrix and use these as the initial cluster centroids. Each centroid would have a dimension of 1 x 1280 x 3, representing the color values of a single pixel in the image.

Alternatively, you could select 4 random points from the 720 x 1280 x 3 matrix, and for each selected point, select 1 example from the 1280, so each centroid would have a dimension of 1 x 3. This is also a valid method but it would be reducing the dimension of the data.

The key thing to remember is that the cluster centroids should be represented in the same feature space as the data points, and that the specific implementation of how to select the initial cluster centroids will depend on the problem and the specific requirements of the application.

In general, the first approach is more common, where you select random pixels and use the RGB values of the pixel as the centroid’s coordinates. The second approach, where you select a random subset of pixels and use the average of the RGB values as the centroid’s coordinates is less common, as it discards information from the data.

Hope so you got the answer

Regards:
Muhammad John Abbas

Topic		Replies	Views
Understanding K-mean clusters Unsupervised Learning, Recommenders, Reinforcement week-1	4	535	January 7, 2023
Why not sort a dataset and pick initial centroids at spaced intervals? Unsupervised Learning, Recommenders, Reinforcement week-1	2	16	July 12, 2024
C3_W1_KMeans_Assignment 4 - Image compression with K-means Unsupervised Learning, Recommenders, Reinforcement week-1	5	450	July 20, 2023
Course 3 week 1 : initializing K means vs choosing the right K Unsupervised Learning, Recommenders, Reinforcement week-1	3	479	April 12, 2023
How different initialization of centroids of K-means results in drastic different clusters ? They all share common cost function Unsupervised Learning, Recommenders, Reinforcement week-1	14	844	November 28, 2022

Initializing Cluster centroids for 3D arrays

Related topics