Parse_data_from_file

How to proceed with this ?

1 Like

Please ask specific questions about the assignment rather than generic questions like how to implement a full graded function. It would help if you go through the markdown cells and lectures prior to posting such questions. They provide good context on what should be done as part of parsing an input csv file.

Thanks for the information, I have gone through the markdown cells, but i couldn’t figure out what to code in this function. And it was not taught in the lectures …

I’ve asked on the mentor forum for a mentor / staff to help you from scratch and improve the assignment.
Good luck.

1 Like

where will i be contacted by the mentor ?

If you don’t hear within 48 hours, please ping me on this thread. I’ll assume help you from scratch.

I just want to let you know that, I was not contacted by anyone.

The file bbc-text.csv consists of 2 columns:

  1. The 1st column is called category which is the label associated with an article. Here are a few labels: 'tech', 'business', 'sport', 'sport', 'entertainment'
  2. The 2nd column is called text which contains the actual news article.

Your model should be able to predict the label associated with a news article.

From the perspective of multi-class classficiation problem like this one, stopwords don’t contribute much towards model performance (i.e. classification accuracy). So, they can be removed.

parse_data_from_file

  1. This function is responsible for reading data for further processing. Here, sentences are going to processed further to bocome model inputs and labels are the class labels.
  2. For each row in the csv file:
    a. Read each row in the csv file.
    b. Remove stopwords from text column and then add it to sentences list.
    c. Add category to the labels list.
  3. Return sentences and labels from the function.

Sentences and labels are converted to numbers for use as input / output to the model. So, use a tokenizer to fit and transform data.

With this information, please take a crack at the assignment. Reply to this thread with clarifications you need.

Can you please highlight the error in notebook why i am getting different figures in miniature dataset ?
Notebook attached

Thanks in advance

ORIGINAL DATASET:

There are 2225 sentences in the dataset.

First sentence has 436 words (after removing stopwords).

There are 2225 labels in the dataset.

The first 5 labels are [‘tech’, ‘business’, ‘sport’, ‘sport’, ‘entertainment’]

MINIATURE DATASET:

There are 2225 sentences in the miniature dataset.

First sentence has 436 words (after removing stopwords).

There are 2225 labels in the miniature dataset.

The first 5 labels are [‘tech’, ‘business’, ‘sport’, ‘sport’, ‘entertainment’]

Expected Output:

ORIGINAL DATASET:

There are 2225 sentences in the dataset.

First sentence has 436 words (after removing stopwords).

There are 2225 labels in the dataset.

The first 5 labels are ['tech', 'business', 'sport', 'sport', 'entertainment']


MINIATURE DATASET:

There are 5 sentences in the miniature dataset.

First sentence has 436 words (after removing stopwords).

There are 5 labels in the miniature dataset.

The first 5 labels are ['tech', 'business', 'sport', 'sport', 'entertainment']

[code removed - moderator]

@Sumit1

You are hardcoding the path to "./data/bbc-text.csv" instead of using the function parameter filename.