Problem with remove_stopwords( )

Mario_Mesas · February 11, 2023, 1:19pm

Hi all,

I am having some trouble understanding the difference between using the word_index argument of the Tokenizer and the split( ) built-in function in the remove_stopwords( ) function. Both are yielding the same result in the test meant for the function, but the problem comes in the parse_data_from_file( ) function usage.

When using the Tokenizer argument in the remove_stopwords( ):

My function looks like this:

The outcome of parse_data_from_file( ) is this:

When using sentence.split( ):

My remove_stopwords( ) function is as follows:

The output of parse_data_from_file( ) is shown, as expected, like so:

Does anyone know why the tokenizer option is not working? Thank you in advance.

balaji.ambresh · February 13, 2023, 5:23pm

Tokenizer uses a much more sophisticated mechanism to create tokens from sentence than simply split by space. For this assignment, please use space to create tokens.

Please remove code from your posts. It’s okay to leave outputs and stacktrace though.

Topic		Replies	Views
Problem with calling remove_stopwords in def parse_data_from_file Natural Language Processing in TensorFlow week-1	2	673	April 8, 2022
C3 W1 assignment: Vocabulary contains 29608 words instead of 29714 Natural Language Processing in TensorFlow week-1	4	648	June 27, 2022
C3-W1 Count Errors Natural Language Processing in TensorFlow week-1	3	683	July 28, 2022
Issue with parse_data_from_file funtion Natural Language Processing in TensorFlow week-1	4	591	July 29, 2022
C3W1 incorrect word count from fit_tokenizer() function Natural Language Processing in TensorFlow week-1	6	475	December 9, 2023

Problem with remove_stopwords( )

Related topics