Help understand cluster better

jfchen · February 23, 2023, 7:35am

In the last slide of the first video, Prof. Andrew showed some applications of clustering. But I feel it is hard for me to come up with a list of features that I can use in clustering “Grouping similar news”.

Shall I use each word in a news article as a feature and come up with a count of it? Would word frequency (word count divided by total words) be better? I feel this would create too many features. What other features I need to consider? How do I link the clusters with links of news?

I feel it is hard to apply what was taught in the lecture to real world applications. Some more details or references would be helpful.

gent.spah · February 23, 2023, 10:29am

I think you need to extract the topic of the discussion and then use clusters on topics.

TMosh · February 23, 2023, 7:44pm

This gets heavily into the practices of Natural Language Processing. That’s a separate and complicated topic all its own.

For this introductory course, it’s far easier to consider clustering only for numerical features.

rmwkwok · February 24, 2023, 1:23am

@jfchen, I agree with Tom that it does not just take a 4-minute introductory video to make its audience able to use what’s covered there. The course was not intended to teach how to group similar news, but it was mentioned as an example for how people use clustering so that learners can further explore themselves. I always think it is powerful to even just mention an example, because it becomes a lead which we can follow to achieve something we want, however, whether we can achieve that is on ourselves. If we cannot even hear just the name of that example, we could never realize it.

You have pointed out some interesting ideas - word frequency, feature engineering. If you are interested in modeling text, then as Tom said, you would need NLP courses which can easily be a series of courses that expect months to complete. I believe you can find some popular ones by searching on coursera.

Raymond

jfchen · February 24, 2023, 3:50am

Thank you all for your comments.

Now if I consider “clustering only for numerical features”, I have a question:
Can the number of features be just 1? For example, a temperature sensor which only emits one temperature value at 2 pm per day, or a closing price of a stock. Does it make sense to apply clustering to this kind of time series data which only contain one feature?

rmwkwok · February 24, 2023, 4:13am

We can do it, but whether it makes sense or not depends on your purpose. For example, what problems are you solving? If clustering that dataset of just one feature can solve your problem, then it makes sense. The algorithm does not give meaning to what we do, we give it the meaning.

TMosh · February 24, 2023, 5:04am

You could use clustering with a single feature, but in that case it’s probably easier to use a statistical method (like a histogram) and visually assess the distribution of the values.

Topic		Replies	Views
Clustering Algorithm Unsupervised Learning, Recommenders, Reinforcement week-module-1	1	362	September 14, 2023
Confusion about clustering Supervised ML: Regression and Classification week-module-1	8	448	September 17, 2023
Unsupervised Learning Clustering Unsupervised Learning, Recommenders, Reinforcement ai-discussions	4	35	August 29, 2024
How to apply Clustering Techniques to a Cannabis Dataset for Analysis AI Discussions ai-discussions	3	47	September 2, 2024
Week1 Video: What is clustering? Unsupervised Learning, Recommenders, Reinforcement week-module-1	1	532	July 28, 2022

Help understand cluster better

Related topics