In the W1 assignment the training data is sorted by class, in such way that we feed the learning algorithm with positive class examples first and then the negative. Should the approach be to shuffle the training data prior to training?
The shuflling serves to mix data from train, dev and test datasets so that they have the same distribution. The train data is labeled so any shufling within it has no effect.
I guess you’re asking why we do not shuffle training data before training and the answer to that would be is that we just count the words belonging to one class or the other. Shuffling would not make any difference - the counts would be the same (eg. word “bad” would appear the same number of times in the negative category whether we shuffle or not)