K-mean question

gmazzaglia · July 6, 2024, 11:57pm

In the kmean formula:

why the formula is expressed as a vector when the point is representative in a matrix? point = [x1, x2]

Nevermnd · July 7, 2024, 2:10am

Sorry, I’m not sure what the issue (or in a sense, difference) is in representing a point as a vector ?

Or, strictly speaking, [x1, x2] is a vector, not a matrix, because, at least in those terms we only have one ‘dimension’.

Secondly, I believe to be more technically correct, given the linear algebra methods we apply to it, it makes more sense to refer to it as a vector. At its most minimum, a vector can be seen as a reference to a point.

Deepti_Prasad · July 7, 2024, 3:57am

Hi @gmazzaglia

A good question, in the screenshot shared by you it explain K-means being a vector quantization method in which an iterative process of assigning each data point to the groups and slowly data points get clustered based on similar features.

Also a vector is an array of numerical values that expresses the location of a floating point along several dimensions

idx = kmeans(X,k) performs k-means clustering to partition the observations of the n-by-p data matrix X into k clusters, and returns an n-by-1 vector (idx) containing cluster indices of each observation. Rows of X correspond to points and columns correspond to variables.

By default, kmeans uses the squared Euclidean distance metric and the k-means++ algorithm for cluster center initialization.

Cluster indices, returned as a numeric column vector. idx has as many rows as X , and each row indicates the cluster assignment of the corresponding observation.

idx = kmeans(X,k,Name,Value) returns the cluster indices with additional options specified by one or more Name,Value pair arguments.

For example, specify the cosine distance, the number of times to repeat the clustering using new initial values, or to use parallel computing.

[idx,C] = kmeans(___) returns the k cluster centroid locations in the k -by-p matrix C .

Cluster centroid locations, returned as a numeric matrix. C is a k-by-p matrix, where row j is the centroid of cluster j . The location of a centroid depends on the distance metric specified by the Distance name-value argument.

[idx,C,sumd] = kmeans(___) returns the within-cluster sums of point-to-centroid distances in the k -by-1 vector sumd .

Within-cluster sums of point-to-centroid distances, returned as a numeric column vector. sumd is a k-by-1 vector, where element j is the sum of point-to-centroid distances within cluster j . By default, kmeans uses the squared Euclidean distance (see 'Distance' metrics).

[idx,C,sumd,D] = kmeans(___) returns distances from each point to every centroid in the n -by-k matrix D .

Distances from each point to every centroid, returned as a numeric matrix. D is an n -by-k matrix, where element (j ,m ) is the distance from observation j to centroid m . By default, kmeans uses the squared Euclidean distance (see 'Distance' metrics).

Regards
DP

gmazzaglia · July 7, 2024, 10:36pm

Thanks @Deepti_Prasad , very clear.

This I wanted to understand.

idx = kmeans(X,k) performs k-means clustering to partition the observations of the n-by-p data matrix X into k clusters, and returns an n-by-1 vector (idx) containing cluster indices of each observation. Rows of X correspond to points and columns correspond to variables.

Regards.
Gus

Topic		Replies	Views
C3_W1_KMeans_Assignment_Computing centroid means Unsupervised Learning, Recommenders, Reinforcement	5	302	December 29, 2023
Vectorization in exercise 2, C3 w1 assignment: k-means Unsupervised Learning, Recommenders, Reinforcement week-module-1	3	509	September 17, 2022
How exactly does k means know that a cluster centroid is closer to these set of data sets? Unsupervised Learning, Recommenders, Reinforcement week-module-1	4	522	February 19, 2023
K-means distortion cost function Unsupervised Learning, Recommenders, Reinforcement week-module-1	1	532	January 26, 2023
K means optimization Unsupervised Learning, Recommenders, Reinforcement week-module-1 , coursera-platform	19	92	July 11, 2025

K-mean question

Related topics