I was testing in my computer the code from C2_W4_lecture_data_preparation and this fuction was removed from version 2.00. Then there is a solution:
This function works well:
def get_emoji_regexp():
emojis = sorted(emoji.EMOJI_DATA, key=len, reverse=True)
pattern = ‘(’ + ‘|’.join(re.escape(u) for u in emojis) + ‘)’
return re.compile(pattern)
Then you can change this code:
print(f’Initial list of tokens: {data}')
data = [ ch.lower() for ch in data
if ch.isalpha()
or ch == ‘.’
or get_emoji_regexp().search(ch) #this is the new line
]
print(f’After cleaning: {data}')