The evidence suggests that “early stopping” after Epoch 600 would be your best solution at least with your other current hyperparameter choices. All the training after that is just a waste of time: it does not improve the training accuracy and the test accuracy only gets worse after that point.