Why didn't lab 1 go over lowercasing even though it was mentioned in the notebook?

here2infinity · August 30, 2024, 8:02pm

I am just curious. Is this the simplest thing to do? And if it is, should this also be the last thing to do as we would have fewer characters to go over?

Nevermnd · August 30, 2024, 8:14pm

@here2infinity if it is a string, yes (easiest thing to do):

I’m not sure I see your point though how it would lower the number of characters (unless you mean in some ASCII/Unicode sense).

I think a large part of going to lowercase is because the word, potentially, could be anywhere in the sentence-- If in your output, it comes at the beginning, it is easy to just re-capitalize the first letter.

But if you keep the capitalization attached, and then that recorded output word ends up in the middle of a sentence, you have a problem.

Deepti_Prasad · August 30, 2024, 8:17pm

hi @here2infinity

it is always better to share a screenshot of the doubt or query you are asking about!!

you mentioned lab 1 but is it ungraded lab or graded? is the lowercase you are asking query about comes from a grade cell codes (then don’t show the codes here)

Although every labs comes with a focus on a task taught in the relative week, you will surely come across this task in coming week and next course of NLP specialization.

Also lowercase is used more when parsing data in the preprocessing step of NLP as making the data in the same format will help in comprehensive analysis.

Keep Learning!!

Regards
DP

here2infinity · August 30, 2024, 8:30pm

I meant that after stemming and removing punctuation, you have fewer characters overall to go through if you would have needed to lower all of them. It is still O(n) but the n is now smaller.

Nevermnd · August 30, 2024, 8:34pm

I have to go out for a bit, maybe @Deepti_Prasad can help you on this one.

Deepti_Prasad · August 30, 2024, 8:53pm

The probable reason of not using lower-casing to prevent model from ability to predict the capitalization of nouns, and sentence beginning.

But yes lower casing does help for better analysis of text in the same format, it basically depends on the kind of data you are handling. even in this lab they had done data lower casing in text processing step, they would have required to do another task of POS(parts of speech) tagging which will come latter part of the course.

Topic		Replies	Views
Dealing with capitalized words in sentiment analysis NLP with Classification and Vector Spaces week-2 , week-3	1	513	January 24, 2023
Preprocessing \| More Information NLP with Classification and Vector Spaces week-1	2	541	April 18, 2022
C1_W1_Assignment's word frequencies NLP with Classification and Vector Spaces	3	313	January 2, 2024
Exercise 3 - get_tokenized_data C2_W3 NLP with Probabilistic Models week-3	3	72	June 25, 2024
C2_W1 Assignment 1: Autocorrect - Exercise 1 NLP with Probabilistic Models week-1	1	47	December 23, 2024

Why didn't lab 1 go over lowercasing even though it was mentioned in the notebook?

Related topics