Hello
I find a mistake in the length of my padded.shape.
I get (2225, 2441) when you expect (2225, 2442)
When I study a code you give in the next assignment that provides your result, I find a word “is” in the list sentences in raw 408.
You can search : …government. believe government is least…
Instead of csv I use pandas
My code is
import numpy as np
import pandas as pd
mysentences = []
mylabels = []
data_file= "./bbc-text.csv"
data_text = pd.read_csv(data_file,delimiter = ',')
for word in stopwords:
data_text['text'] = data_text['text'].str.replace(" " + word + " "," ")
while data_text['text'].str.contains(" ").any():
data_text['text'] = data_text['text'].str.replace(" "," ")
mysentences = data_text.values[:,1:].squeeze().tolist()
labels = data_text.values[:,0].squeeze().tolist()
The code that is giving your result is:
sentences = []
labels = []
with open("./bbc-text.csv", 'r') as csvfile:
reader = csv.reader(csvfile, delimiter=',')
next(reader)
for row in reader:
labels.append(row[0])
sentence = row[1]
for word in stopwords:
token = " " + word + " "
sentence = sentence.replace(token, " ")
sentence = sentence.replace(" ", " ")
sentences.append(sentence)
Could you please help me and tell me if my code is not correct and in case it is where I am wrong?
Thanks