@here2infinity I see your point here.
It makes sense when they give us âhappiâ, but I canât think of any other words besides what you suggested that would start with âsunniâ either and agree that example is a little confusingâŚ
But as they also mention, the gist here it is looking for the most âcommonâ stems. Thus for the variants of the word âhappyâ, âhappiâ comes up more often.
By incorrect spellings of the root word, they mean: Well, happier, happiness, etc. The real âroot wordâ here is âhappyâ, but thatâs not what the stemming algorithm comes up with, so in that sense it âmisspellsâ the root.
I believe the stemming algorithm is also taking into account the underlying word vector as well.
Thus their point, âhappâ might be even a better stem, but it is also too close (in spelling) to âhappenâ. Recall the word vectors (with regards to cosine distance) exhibit some shared relationship between similar words-- But when we stem, it is not âjustâ a matter of spelling.
This, at least is my understanding.
*Iâll leave what I have just stated for further edification, but I was wrong about word vectors being involved whatsoever. I clicked the link they provide in the lab for the Porter Stemming algorithm, and started going through the Python translation:
https://tartarus.org/martin/PorterStemmer/python.txt
It is basically removing common suffixes and also running a count of vowel-consonant pairs, to determine where to stem.
Even though I have taken NLP, and for quite awhile was an ESL teacher, unfortunately one thing I am not, is a linguist⌠So off the top of my head I couldnât tell you how the significance of that measurement is used.