I using the following regex to get all the words. The list words has all words in lowercase already
words = [‘’.join(re.findall(‘[a-z]’, word)) for word in words]
I am getting 6304 unique words as output instead of the expected 6116 words. Please help me with what I am missing.
Hi, Naveen!
I didn´t use “findall(‘[a-z]’, word”, instead use another regex. In the Detailed Hints (above def process_data method or function) explain about the possible patterns to use.
I hope you find my contribution useful.
Lev.
Thank you for your answer. I am wondering why my solution was not giving. Obviously there is an edge test case that I’m missing. I want to know what it is for better understanding. I also tried with [a-z0-9] but that’s also wrong.
Hi Naveen_Malla,
In the Detailed Hint section, the last sentence:
The pattern of the regex is one of that options.
Lev.
I have exactly the same problem and am stuck with it. It will be really helpful to know the example of a word that I have on my list but I should not. Otherwise, it is kind of guessing