Hi everyone
in C2_W4 there is a code for " Cleaning and Tokenization " :

when I ran this code I got this error :
module ‘emoji’ has no attribute ‘get_emoji_regexp’
I think it’s all about the new update
so I decide to run that this way :

def get_emoji_regexp():
# Sort emoji by length to make sure multi-character emojis are
# matched first
emojis = sorted(emoji.EMOJI_DATA, key=len, reverse=True)
pattern = u’(’ + u’|‘.join(re.escape(u) for u in emojis) + u’)’
return re.compile(pattern)

Print the tokenized version of the corpus

print(f’Initial list of tokens: {data}')

Filter tokenized corpus using list comprehension

data = [ ch.lower() for ch in data
if ch.isalpha()
or ch == ‘.’
or get_emoji_regexp().search(ch)

Hi ahmad_hasani,

Thanks for reporting this! get_emoji_regexp still works in my notebook, so it seems to depend on the active version of the package in the environment. Certainly something to look into with the next update of the course.

The function has been deprecated and removed in new versions of the package
You can use emoji.emoji_list(ch) for instead.