Problem obtaining unique words

Naveen_Malla · December 26, 2022, 6:09pm

I using the following regex to get all the words. The list words has all words in lowercase already

words = [‘’.join(re.findall(‘[a-z]’, word)) for word in words]

I am getting 6304 unique words as output instead of the expected 6116 words. Please help me with what I am missing.

LevValenzuela · December 28, 2022, 4:48am

Hi, Naveen!

I didn´t use “findall(‘[a-z]’, word”, instead use another regex. In the Detailed Hints (above def process_data method or function) explain about the possible patterns to use.

I hope you find my contribution useful.

Lev.

Naveen_Malla · December 29, 2022, 12:44pm

Thank you for your answer. I am wondering why my solution was not giving. Obviously there is an edge test case that I’m missing. I want to know what it is for better understanding. I also tried with [a-z0-9] but that’s also wrong.

LevValenzuela · December 29, 2022, 9:28pm

Hi Naveen_Malla,

In the Detailed Hint section, the last sentence:

The pattern of the regex is one of that options.

Lev.

Dorota_Kowalska · April 21, 2023, 7:55am

I have exactly the same problem and am stuck with it. It will be really helpful to know the example of a word that I have on my list but I should not. Otherwise, it is kind of guessing

Topic		Replies	Views
Stuck at process_data NLP with Probabilistic Models week-module-1	9	700	July 13, 2023
Error with lab C2_W1_Assignment NLP with Probabilistic Models week-module-1	2	532	May 17, 2023
C2_W1 Exercise 1 process_data to get correct unique word count NLP with Probabilistic Models week-module-1	5	520	March 7, 2024
Stuck on 1st exercice NLP with Probabilistic Models week-module-1	1	581	January 17, 2022
Challenged with Unique Word Calculation for Vocabulary NLP with Classification and Vector Spaces week-module-2 , week-module-3	24	797	March 21, 2022

Problem obtaining unique words

Related topics