Train data shuffle prior to training

ostrvce · June 28, 2022, 8:42am

In the W1 assignment the training data is sorted by class, in such way that we feed the learning algorithm with positive class examples first and then the negative. Should the approach be to shuffle the training data prior to training?

gent.spah · June 28, 2022, 9:05am

The shuflling serves to mix data from train, dev and test datasets so that they have the same distribution. The train data is labeled so any shufling within it has no effect.

arvyzukai · June 28, 2022, 9:17am

I guess you’re asking why we do not shuffle training data before training and the answer to that would be is that we just count the words belonging to one class or the other. Shuffling would not make any difference - the counts would be the same (eg. word “bad” would appear the same number of times in the negative category whether we shuffle or not)

Topic		Replies	Views
[Course 5 - Week 2 - Assignment 2] Why not "shuffle" the training data in SGD? Sequence Models coursera-platform	1	495	February 17, 2023
Why shuffle the train data C3_W3_Lab1 and Lab 2? Natural Language Processing in TensorFlow week-3	1	264	June 1, 2023
Question about suffleing time series data for classification problems Sequences, Time Series and Prediction week-2	2	522	June 8, 2022
Why Shuffle Sequential Data Sequences, Time Series and Prediction week-2	4	755	November 9, 2023
Data distribution for training-dev set Structuring Machine Learning Projects coursera-platform	2	552	December 29, 2022

Train data shuffle prior to training

Related topics