Emoji.get_emoji_regexp().search(ch)

ahmad_hasani · September 17, 2022, 7:36am

Hi everyone
in C2_W4 there is a code for " Cleaning and Tokenization " :

when I ran this code I got this error :
module ‘emoji’ has no attribute ‘get_emoji_regexp’
I think it’s all about the new update
so I decide to run that this way :

def get_emoji_regexp():
# Sort emoji by length to make sure multi-character emojis are
# matched first
emojis = sorted(emoji.EMOJI_DATA, key=len, reverse=True)
pattern = u’(’ + u’|‘.join(re.escape(u) for u in emojis) + u’)’
return re.compile(pattern)

Print the tokenized version of the corpus

print(f’Initial list of tokens: {data}')

Filter tokenized corpus using list comprehension

data = [ ch.lower() for ch in data
if ch.isalpha()
or ch == ‘.’
or get_emoji_regexp().search(ch)
]

reinoudbosch · September 19, 2022, 10:37pm

Hi ahmad_hasani,

Thanks for reporting this! get_emoji_regexp still works in my notebook, so it seems to depend on the active version of the package in the environment. Certainly something to look into with the next update of the course.

Fang_Wang · October 27, 2022, 10:54pm

The function has been deprecated and removed in new versions of the package
You can use emoji.emoji_list(ch) for instead.

MartaPL · July 24, 2024, 8:56am

I have used:
data = [ ch.lower() for ch in data
if ch.isalpha()
or ch == ‘.’
or emoji.emoji_list(ch)
]

Deepti_Prasad · July 24, 2024, 9:56am

Hello @MartaPL

This is a very old post, can I know if you have any issue regarding the codes you shared. Also just to inform you this course was updated in December of 2023, so if there are any issue, kindly create a new post thread with your query without sharing any codes which grades your assignment.

Regards
DP

Topic		Replies	Views
C2_W4_lecture.emoji.get_emoji_regexp() doen't work in new emoji versions NLP with Probabilistic Models week-4	0	280	January 19, 2024
Course 5, Week 2, Assignment 2 (Emojify): Autograder shows 0 for my submission Sequence Models coursera-platform	8	611	August 5, 2021
Emojify_V2: ‘function’ object has no attribute ‘dtype’ Sequence Models coursera-platform	3	714	May 27, 2021
Emojify - KeyError ('funny',) Sequence Models coursera-platform	18	873	July 12, 2021
Emojify - KeyError (‘funny’,) Sequence Models coursera-platform	44	1533	August 13, 2022

Emoji.get_emoji_regexp().search(ch)

Print the tokenized version of the corpus

Filter tokenized corpus using list comprehension

Related topics