POS tagging: should words with capital letters be treated as lower case?

Zhenwe · November 2, 2025, 11:06am

We extracting the word list, we didn’t convert words into lower case. therefore the word ‘The’ is different from ‘the’. But shouldn’t they be treated as the same as "The“ and ‘the’ should probably have same POS tag?

gent.spah · November 4, 2025, 5:51am

I think this is right, from what I remember, there is a preprocessing step to convert them to lower, or maybe the tokenizer does that! Unless they leave this intentionally because of some meaning of the word.

Topic		Replies	Views
Why didn't lab 1 go over lowercasing even though it was mentioned in the notebook? NLP with Classification and Vector Spaces week-module-1	5	36	August 30, 2024
C1_W1_Assignment's word frequencies NLP with Classification and Vector Spaces	3	322	January 2, 2024
Dealing with capitalized words in sentiment analysis NLP with Classification and Vector Spaces week-module-2 , week-module-3	1	534	January 24, 2023
Preprocessing \| More Information NLP with Classification and Vector Spaces week-module-1	2	556	April 18, 2022
AttributeError: 'list' object has no attribute 'lower' Natural Language Processing in TensorFlow week-module-1	1	586	November 11, 2022

POS tagging: should words with capital letters be treated as lower case?

Related topics