I would like to know if the feature extraction algorithm used in week one can be extended to multi class classification… I mean, if it still makes sense in such a case.
Hi Mauricio_Toro,
You can use countvectorizer for multi-class classification. See, e.g., this post.
but you mean to use tf_idf? The problem with tf_idf is that it will be a very sparse matrix of features. Let’s say I have 50 classes and 30,000 unique words. I was wondering if I can sum the frequencies for each of the 50 classes and just get 50 features to represent the texts, as a generalisation of the sum of frequencies presented in week 1 for binary classification… ?
Hi Maurice_Toro,
As I indicated in my response to another question you asked, I feel countvectorizer is used here mainly as a pedagogical tool. The fact that it can be used for multi-class classification does not mean it is the most efficient one. You will find in the rest of the specialization that more efficient ways exist to extract features.