I was thinking the clustering algorithms like K-meams using distance to find similar groups among the data samples. It seems to me that the size of the data samples doesn’t help to improve the performance of clustering. In some cases, the size of data samples available to clustering is fixed for example, the clients of services that a company provides. The goal is to group clients according to products they buy.
I want to understand clustering better:
Does clustering need training data? or clustering is to help find some pattern of the given data samples?
To a 20-client sized company, how can they improve the clustering result? Not sure finding more features will help?
For time series data set, will historical data help improve the performance of clustering? for example, the company stores the purchase records of these 20 clients for last 3 years.
Clustering groups the data according to distance from centroids.
It all depends on the data you provide and what distinguishes one client from the other, the clustering algorithm will make clusters, more features might help of course depending on what contribution they have in the clustering process.
Ideally yes if its a parameter of the clustering algorithm
Ideally yes if its a parameter of the clustering algorithm.
thanks, normally how to use time series data of 20 clients in clustering? treat them independently in clustering algorithm ? but there are only 20 clients. is there any example that I can refer to? thank you.
I would say create a feature which has the time in years that the client has been with the company, I don’t think you can use the times series as it directly!