When using the process_tweet function, I got an error even though earlier in the assignment, all tests pass for this function.
print([type(tweet) for tweet in test_x[:5]])
TypeError Traceback (most recent call last)
in
----> 1 tmp_accuracy = test_logistic_regression(test_x, test_y, freqs, theta)
2 print(f"Logistic regression model’s accuracy = {tmp_accuracy:.4f}")
in test_logistic_regression(test_x, test_y, freqs, theta, predict_tweet)
18 for tweet in test_x:
19 # get the label prediction for the tweet
—> 20 y_pred = predict_tweet(test_y, freqs, theta)
21
22 if y_pred > 0.5:
in predict_tweet(tweet, freqs, theta)
12
13 # extract the features of the tweet and store it into x
—> 14 x = extract_features(tweet, freqs)
15
16 # make the prediction using x and theta
in extract_features(tweet, freqs, process_tweet)
9 ‘’’
10 # process_tweet tokenizes, stems, and removes stopwords
—> 11 word_l = process_tweet(tweet)
12
13 # 3 elements for [bias, positive, negative] counts
~/work/utils.py in process_tweet(tweet)
19 stopwords_english = stopwords.words(‘english’)
20 # remove stock market tickers like $GE
—> 21 tweet = re.sub(r’$\w*‘, ‘’, tweet)
22 # remove old style retweet text “RT”
23 tweet = re.sub(r’^RT[\s]+', ‘’, tweet)
/opt/conda/lib/python3.7/re.py in sub(pattern, repl, string, count, flags)
190 a callable, it’s passed the Match object and must return
191 a replacement string to be used.“”"
→ 192 return _compile(pattern, flags).sub(repl, string, count)
193
194 def subn(pattern, repl, string, count=0, flags=0):
TypeError: cannot use a string pattern on a bytes-like object