C1_W1Natural Language Processing with Classification and Vector Spaces Exercise 5

When using the process_tweet function, I got an error even though earlier in the assignment, all tests pass for this function.
print([type(tweet) for tweet in test_x[:5]])
TypeError Traceback (most recent call last)
in
----> 1 tmp_accuracy = test_logistic_regression(test_x, test_y, freqs, theta)
2 print(f"Logistic regression model’s accuracy = {tmp_accuracy:.4f}")

in test_logistic_regression(test_x, test_y, freqs, theta, predict_tweet)
18 for tweet in test_x:
19 # get the label prediction for the tweet
—> 20 y_pred = predict_tweet(test_y, freqs, theta)
21
22 if y_pred > 0.5:

in predict_tweet(tweet, freqs, theta)
12
13 # extract the features of the tweet and store it into x
—> 14 x = extract_features(tweet, freqs)
15
16 # make the prediction using x and theta

in extract_features(tweet, freqs, process_tweet)
9 ‘’’
10 # process_tweet tokenizes, stems, and removes stopwords
—> 11 word_l = process_tweet(tweet)
12
13 # 3 elements for [bias, positive, negative] counts

~/work/utils.py in process_tweet(tweet)
19 stopwords_english = stopwords.words(‘english’)
20 # remove stock market tickers like $GE
—> 21 tweet = re.sub(r’$\w*‘, ‘’, tweet)
22 # remove old style retweet text “RT”
23 tweet = re.sub(r’^RT[\s]+', ‘’, tweet)

/opt/conda/lib/python3.7/re.py in sub(pattern, repl, string, count, flags)
190 a callable, it’s passed the Match object and must return
191 a replacement string to be used.“”"
→ 192 return _compile(pattern, flags).sub(repl, string, count)
193
194 def subn(pattern, repl, string, count=0, flags=0):

TypeError: cannot use a string pattern on a bytes-like object

When you are invoking predict_tweet there, you are supposed to be passing one tweet as the first argument, but you are passing an array of the labels for the tweets, which is why you get that error message.

A perfectly correct function can still throw errors if you call it incorrectly. :nerd_face:

1 Like

Thanks. I saw the error. I replaced test_y to just tweet. The error is fixed.

1 Like