# K-mean question

In the kmean formula:

why the formula is expressed as a vector when the point is representative in a matrix? point = [x1, x2]

Sorry, I’m not sure what the issue (or in a sense, difference) is in representing a point as a vector ?

Or, strictly speaking, [x1, x2] is a vector, not a matrix, because, at least in those terms we only have one ‘dimension’.

Secondly, I believe to be more technically correct, given the linear algebra methods we apply to it, it makes more sense to refer to it as a vector. At its most minimum, a vector can be seen as a reference to a point.

A good question, in the screenshot shared by you it explain K-means being a vector quantization method in which an iterative process of assigning each data point to the groups and slowly data points get clustered based on similar features.

Also a vector is an array of numerical values that expresses the location of a floating point along several dimensions

1. `idx` = kmeans(`X`,`k`) performs k-means clustering to partition the observations of the n-by-p data matrix `X` into `k` clusters, and returns an n-by-1 vector (`idx`) containing cluster indices of each observation. Rows of `X` correspond to points and columns correspond to variables.

By default, `kmeans` uses the squared Euclidean distance metric and the k-means++ algorithm for cluster center initialization.

Cluster indices, returned as a numeric column vector. `idx` has as many rows as `X` , and each row indicates the cluster assignment of the corresponding observation.

1. `idx` = kmeans(`X`,`k`,`Name,Value`) returns the cluster indices with additional options specified by one or more `Name,Value` pair arguments.

For example, specify the cosine distance, the number of times to repeat the clustering using new initial values, or to use parallel computing.

1. `[idx`,`C`] = kmeans(___) returns the `k` cluster centroid locations in the `k` -by-p matrix `C` .

Cluster centroid locations, returned as a numeric matrix. `C` is a `k`-by-p matrix, where row j is the centroid of cluster j . The location of a centroid depends on the distance metric specified by the `Distance` name-value argument.

1. `[idx`,`C`,`sumd`] = kmeans(___) returns the within-cluster sums of point-to-centroid distances in the `k` -by-1 vector `sumd` .

Within-cluster sums of point-to-centroid distances, returned as a numeric column vector. `sumd` is a `k`-by-1 vector, where element j is the sum of point-to-centroid distances within cluster j . By default, `kmeans` uses the squared Euclidean distance (see `'Distance'` metrics).

1. `[idx`,`C`,`sumd`,`D`] = kmeans(___) returns distances from each point to every centroid in the n -by-`k` matrix `D` .

Distances from each point to every centroid, returned as a numeric matrix. `D` is an n -by-`k` matrix, where element (j ,m ) is the distance from observation j to centroid m . By default, `kmeans` uses the squared Euclidean distance (see `'Distance'` metrics).

Regards
DP

Thanks @Deepti_Prasad , very clear.

This I wanted to understand.

idx = kmeans(X,k) performs k-means clustering to partition the observations of the n-by-p data matrix X into k clusters, and returns an n-by-1 vector (idx) containing cluster indices of each observation. Rows of X correspond to points and columns correspond to variables.

Regards.
Gus