Hi everyone
in C2_W4 there is a code for " Cleaning and Tokenization " :

when I ran this code I got this error :
module ‘emoji’ has no attribute ‘get_emoji_regexp’
I think it’s all about the new update
so I decide to run that this way :

def get_emoji_regexp():
# Sort emoji by length to make sure multi-character emojis are
# matched first
emojis = sorted(emoji.EMOJI_DATA, key=len, reverse=True)
pattern = u’(’ + u’|‘.join(re.escape(u) for u in emojis) + u’)’
return re.compile(pattern)

Print the tokenized version of the corpus

print(f’Initial list of tokens: {data}')

Filter tokenized corpus using list comprehension

data = [ ch.lower() for ch in data
if ch.isalpha()
or ch == ‘.’
or get_emoji_regexp().search(ch)

Hi ahmad_hasani,

Thanks for reporting this! get_emoji_regexp still works in my notebook, so it seems to depend on the active version of the package in the environment. Certainly something to look into with the next update of the course.

The function has been deprecated and removed in new versions of the package
You can use emoji.emoji_list(ch) for instead.

I have used:
data = [ ch.lower() for ch in data
if ch.isalpha()
or ch == ‘.’
or emoji.emoji_list(ch)

Hello @MartaPL

This is a very old post, can I know if you have any issue regarding the codes you shared. Also just to inform you this course was updated in December of 2023, so if there are any issue, kindly create a new post thread with your query without sharing any codes which grades your assignment.


1 Like