Help understand cluster better

In the last slide of the first video, Prof. Andrew showed some applications of clustering. But I feel it is hard for me to come up with a list of features that I can use in clustering “Grouping similar news”.

Shall I use each word in a news article as a feature and come up with a count of it? Would word frequency (word count divided by total words) be better? I feel this would create too many features. What other features I need to consider? How do I link the clusters with links of news?

I feel it is hard to apply what was taught in the lecture to real world applications. Some more details or references would be helpful.

I think you need to extract the topic of the discussion and then use clusters on topics.

This gets heavily into the practices of Natural Language Processing. That’s a separate and complicated topic all its own.

For this introductory course, it’s far easier to consider clustering only for numerical features.

@jfchen, I agree with Tom that it does not just take a 4-minute introductory video to make its audience able to use what’s covered there. The course was not intended to teach how to group similar news, but it was mentioned as an example for how people use clustering so that learners can further explore themselves. I always think it is powerful to even just mention an example, because it becomes a lead which we can follow to achieve something we want, however, whether we can achieve that is on ourselves. If we cannot even hear just the name of that example, we could never realize it.

You have pointed out some interesting ideas - word frequency, feature engineering. If you are interested in modeling text, then as Tom said, you would need NLP courses which can easily be a series of courses that expect months to complete. I believe you can find some popular ones by searching on coursera.

Raymond

Thank you all for your comments.

Now if I consider “clustering only for numerical features”, I have a question:
Can the number of features be just 1? For example, a temperature sensor which only emits one temperature value at 2 pm per day, or a closing price of a stock. Does it make sense to apply clustering to this kind of time series data which only contain one feature?

We can do it, but whether it makes sense or not depends on your purpose. For example, what problems are you solving? If clustering that dataset of just one feature can solve your problem, then it makes sense. The algorithm does not give meaning to what we do, we give it the meaning.

You could use clustering with a single feature, but in that case it’s probably easier to use a statistical method (like a histogram) and visually assess the distribution of the values.