In the video, Dr. Moroney said, “So while the number of accurate predictions increased over time, what was interesting was that the confidence per prediction effectively decreased.” I wonder why confidence drops while more accurate predictions are made. Would someone explain this to me?

Consider a binary classification problem on a dataset with non-linear decision boundary. When you start training a model, predictions are random and the model can do a simple guess as all data points belong to class 1 with probability of 1. As the decision boundary becomes better with more training, the predictions are less in the extreme for points close to the decision boundary and much closer with the actual class. Confidence for these data points is much closer to the decision boundary and are not extreme.
See this dataset

Thank you for your explanation. Let me make sure that I understand it correctly. The concept sounds similar to over-fitting to me. With more predictions made, the decision boundary shrinks to fit the training data better. But then we have less confidence in points falling outside/near the boundary. Am I right? I appreciate your patience and time.