I am just curious. Is this the simplest thing to do? And if it is, should this also be the last thing to do as we would have fewer characters to go over?
@here2infinity if it is a string, yes (easiest thing to do):
I’m not sure I see your point though how it would lower the number of characters (unless you mean in some ASCII/Unicode sense).
I think a large part of going to lowercase is because the word, potentially, could be anywhere in the sentence-- If in your output, it comes at the beginning, it is easy to just re-capitalize the first letter.
But if you keep the capitalization attached, and then that recorded output word ends up in the middle of a sentence, you have a problem.
it is always better to share a screenshot of the doubt or query you are asking about!!
you mentioned lab 1 but is it ungraded lab or graded? is the lowercase you are asking query about comes from a grade cell codes (then don’t show the codes here)
Although every labs comes with a focus on a task taught in the relative week, you will surely come across this task in coming week and next course of NLP specialization.
Also lowercase is used more when parsing data in the preprocessing step of NLP as making the data in the same format will help in comprehensive analysis.
Keep Learning!!
Regards
DP
I meant that after stemming and removing punctuation, you have fewer characters overall to go through if you would have needed to lower all of them. It is still O(n)
but the n
is now smaller.
I have to go out for a bit, maybe @Deepti_Prasad can help you on this one.
The probable reason of not using lower-casing to prevent model from ability to predict the capitalization of nouns, and sentence beginning.
But yes lower casing does help for better analysis of text in the same format, it basically depends on the kind of data you are handling. even in this lab they had done data lower casing in text processing step, they would have required to do another task of POS(parts of speech) tagging which will come latter part of the course.