Problem obtaining unique words

I using the following regex to get all the words. The list words has all words in lowercase already

words = [‘’.join(re.findall(‘[a-z]’, word)) for word in words]

I am getting 6304 unique words as output instead of the expected 6116 words. Please help me with what I am missing.

Hi, Naveen!

I didn´t use “findall(‘[a-z]’, word”, instead use another regex. In the Detailed Hints (above def process_data method or function) explain about the possible patterns to use.

I hope you find my contribution useful.


Thank you for your answer. I am wondering why my solution was not giving. Obviously there is an edge test case that I’m missing. I want to know what it is for better understanding. I also tried with [a-z0-9] but that’s also wrong.

Hi Naveen_Malla,

In the Detailed Hint section, the last sentence:

The pattern of the regex is one of that options.


I have exactly the same problem and am stuck with it. It will be really helpful to know the example of a word that I have on my list but I should not. Otherwise, it is kind of guessing