Hi there,
You want your model to perform better on the training data to get closer to the benchmark.
This is why you need to allow the model to learn a more complex behaviour:
- therefore you should try decreasing regularization (and punishing complexity less)
- Train a bigger model (as you selected already)
Reducing variance should not be the problem at the moment since you rather seem to be „underfitting“ than „overfitting“ and want to learn more complexity as stated above. After all your train and dev set error are in a comparable range. Also: feel free to take a look at this article here if you want to take a look at a typical high variance scenario which you rather reach at a („too“) high model complexity: Bias and Variance in Machine Learning - Javatpoint
Best regards
Christian