I wrote this code:
(Solution code removed, as posting it publicly is against the honour code of this community, regardless if it is correct or not)
And after applying:
#DO NOT MODIFY THIS CELL
word_l = process_data('./data/shakespeare.txt')
vocab = set(word_l) # this will be your new vocabulary
print(f"The first ten words in the text are: \n{word_l[0:10]}")
print(f"There are {len(vocab)} unique words in the vocabulary.")
it returns:
The first ten words in the text are:
['o', 'for', 'a', 'muse', 'of', 'fire', 'that', 'would', 'ascend', 'the']
There are 6303 unique words in the vocabulary.
but Expected Output
The first ten words in the text are:
['o', 'for', 'a', 'muse', 'of', 'fire', 'that', 'would', 'ascend', 'the']
There are 6116 unique words in the vocabulary.
UPDATE:
It is solved. Just need to apply re.findall instead of all the stuff I did