Question about UNQ_C7

UNQ_C7 (UNIQUE CELL IDENTIFIER, DO NOT EDIT)

GRADED_FUNCTION: preprocess_data

def preprocess_data(train_data, test_data, count_threshold):
“”"
Preprocess data, i.e.,
- Find tokens that appear at least N times in the training data.
- Replace tokens that appear less than N times by “” both for training and test data.
Args:
train_data, test_data: List of lists of strings.
count_threshold: Words whose count is less than this are
treated as unknown.

Returns:
    Tuple of
    - training data with low frequent words replaced by "<unk>"
    - test data with low frequent words replaced by "<unk>"
    - vocabulary of words that appear n times or more in the training data
"""
### START CODE HERE (Replace instances of 'None' with your code) ###

# Get the closed vocabulary using the train data
vocabulary = get_words_with_nplus_frequency(train_data, count_threshold=count_threshold)

# For the train data, replace less common words with "<unk>"
train_data_replaced = replace_oov_words_by_unk(train_data, vocabulary, unknown_token="<unk>")

# For the test data, replace less common words with "<unk>"
test_data_replaced = replace_oov_words_by_unk(test_data, vocabulary, unknown_token="<unk>")

### END CODE HERE ###
return train_data_replaced, test_data_replaced, vocabulary


Can anyone help me to check where I made a mistake? My replace_oov_words_by_unk function was tested and passed.

Hi Binqi_Lian,

You may not want to hardcode unknown_token in replace_oov_words_by_unk

Yes, I tried. But it still shows me an error.

### START CODE HERE ###

# Get the closed vocabulary using the train data
vocabulary = get_words_with_nplus_frequency(train_data, count_threshold=count_threshold)

# For the train data, replace less common words with "<unk>"
train_data_replaced = replace_oov_words_by_unk(train_data, vocabulary)

# For the test data, replace less common words with "<unk>"
test_data_replaced = replace_oov_words_by_unk(test_data, vocabulary)

### END CODE HERE ###
return train_data_replaced, test_data_replaced, vocabulary

In that case there’s a problem somewhere else. If you cannot find it, feel free to send me your notebook as an attachment to a direct mail so I can have a look.

I just sent my notebook to you. Thank you so much!

Hi Binqi-Lian,

You have to include unknown_token as a parameter in replace_oov_by_unk; just don’t hardcode it!