For this question, since we have a low train error but high dev error, this means that we have a high variance and we are overfitting the model.
Therefore, if we include more training data, does it gonna make the overfitting problem worse? Also, why does more training data help with lowering the variance? My thought is that if we have more available training data, then the model can be trained to have a higher accuracy, which leads to a lower bias.
Have you seen this video?
Getting more data means getting good quality data that gets the NN to look for more variation in input / output mappings. Hope is, as you provide more data, NN will get better with additional training and learn to perform better on unseen data. Generalization is measured in terms of dev set performance if you have 2 splits of dataset.