It says:
Tile/copy our data to increase the training set size and reduce the number of training epochs.
I don’t understand it.
If we want to increase the training set size, shouldn’t we get more new data instead of copying existing data? How is it useful?
I understand each epoch is doing something similar to gradient descent? So say epochs is 10, then gradient descent will be applied 10 times on the parameters (w and b). Is it? Copying existing data increase the times of gradient descent? Then why do we still need epochs?