Hi, can somebody explain what is wrong?? In the markdown cell, it is written that text can be accessed using row[1] and labels can be accessed using row[0] but when I’m coding it, it’s resulting in an index error
If you’re iterating on reader, how many rows of data from the file are you getting for each iteration?
I don’t know because I don’t understand the documentation of this whole function of csv_reader
The error is thrown when it calls remove_stopword(row[1]).
Have you reviewed what param are you passing to remove_stopword? what is the value of row[1]? what is remove_stopword expecting what is the dataset used in remove_stopword and see if row[1] makes sense for that dataset?
That is what I would start doing.
csv.reader(csvfile, dialect=‘excel’, **fmtparams )¶
Return a reader object which will iterate over lines in the given csvfile.
Each row read from the csv file is returned as a list of strings.
I still don’t fully understand it. What I have gathered is, that the first word of the csv file is ‘label’ and all the other data is ‘text’, and the csv.reader function basically returns each line in the form of a list. Am I correct up until now? If I am, how do I access each line to separate label and text, and then apply the stopword function on text ?
It has worked now. I was making three mistakes:
- using the wrong delimeter
- not skipping the first line
- appending label after using stopword (I don’t know if it is a mistake or not, can you confirm?)
Yes, the first row is going to be the column headers/labels. Sometimes when reading a dot CSV you want these, but for different purposes than the rest of the data. Subsequent rows are the data, each carried as a list of strings, with one element of the list per CSV column. In this case, one string is the label, and the second string is the text. If you were using the wrong delimiter, you probably got a list of strings with a single element, which is why row[1] threw an exception.