Not clear how LDA is useful or the best tool

toontalk · August 6, 2023, 5:00pm

When I look at the words in different generated topics I find it hard to see what the theme of each really is. And I worry that LDA is too crude since it ignores word order, handles negation and synonyms poorly, and generally ignores semantics.

My impression is that there are much better tools today that generate embeddings for entire sentences or posts. These can be clustered and the measure of how close some new text is to one of these clusters is better than what LDA provides. An old but freely available example is Google’s [Universal Sentence Encoder] Universal Sentence Encoder | TensorFlow Hub). No doubt newer models are even better.

Is my impression that all this about removing stop words, lemmatization, etc. were state-of-the-art many years ago but are they still competitive with new techniques?

rykeenan · August 7, 2023, 5:05pm

Thanks for your comment @toontalk! You’re certainly correct that LDA is not a cutting edge technique. I would love to see different approaches to topic modeling for the data in this lab so please do share if you explored some other alternatives. In this case, like in many of the other labs in this program we went with simpler and hopefully more interpretable methods. We did this in hopes of giving less technical folks a means of understanding what’s going on in the labs but also in the spirit of what Robert talks about in favoring well understood, more interpretable techniques in crisis situations rather than SOTA algorithms (“Land Cruiser” solutions). But that’s not to say there’s not room for improvement on the model results so keep exploring!

toontalk · August 8, 2023, 7:26am

Thanks. That all makes sense. I guess there is a trade-off as to whether the course is teaching technical details that are useful in the short-term but are likely to be superceded in the medium and long term.

Also I think at the minimum the land cruiser metaphor should have been discussed in the context of LDA - at least a few sentences that it is simpler and well-understood compared to more modern techniques that have many advantages such as much better capture of the semantics of the texts.

Topic		Replies	Views
Alex, could you say more about the Natural Language Interfaces you mentioned? AI Discussions ai-discussions , data-centric	1	43	May 16, 2023
My thoughts on Language Model Sequence Models	3	500	October 8, 2022
Not impressed with C3_W2 Programming Assignment (Deep N-grams) NLP with Sequence Models week-2	3	529	December 7, 2023
NLP course 4 week 3 poor content quality NLP with Attention Models week-3	3	653	December 23, 2023
Why only Text Summarisation? Generative AI with Large Language Models feedback , week-1 , week-2 , week-3	2	308	February 27, 2024

Not clear how LDA is useful or the best tool

Related topics