From what I understood, the max of # epoch would be 300 when we have 300 examples in training set. But here how comes the 4000 epoch? Thanks in advance.

The meaning of “epoch” is one complete pass through the training data. If there are 300 samples, then you will be processing all 300 samples in each epoch. How many epochs (iterations) of training you need to do in order to get good convergence is independent of the number of input samples you have in your training set. In this example, they take a total of 4000 training passes through all of the 300 samples.