How is the training done from 19*19*425 as labels are just class and boxes

Training in YOLO works just like training in other machine learning algorithms. The output of the neural net and the training data have the same shape. The loss function compares them in some way, and the difference drives iterative parameter adjustment to minimize the error. Details depend on the specific loss function and optimizer (Adam, SGD etc) but the basic ideas are the same.

Slightly simplified, for each training iteration YOLO compares ground truth \hat{p_c} against predicted p_c and calls it confidence_loss. Compares ground truth object centers (\hat{b_x}, \hat{b_y}) and shape (\hat{b_w}, \hat{b_h}) with predicted and calls it coordinates_loss. And compares class prediction \hat{c_i} ground truth with predicted and calls it classification_loss. The three loss components can be weighted independently. Their sum is fed to the optimizer.

Clearly the bounding box values and class label come directly from the training data set. The value for \hat{p_c} for each location is derived from the computing the bounding box center and converting that to grid cell indices for x and y. The value is 1 for locations that have an object center within it, and 0 for the rest. Hope this helps.