Encoding the anchor boxes

If you look at predict(), then, I think you can fill the gap.

The final output from Darknet, a convolutional network for Yolo, is as you see,

conv2d_22 (Conv2D) (None, 19, 19, 425) 435625 ['leaky_re_lu_21[0][0]']

This is yolo_model_outputs in the following code block from predict().

yolo_model_outputs = yolo_model(image_data)
yolo_outputs = yolo_head(yolo_model_outputs, anchors, len(class_names))
out_scores, out_boxes, out_classes = yolo_eval(yolo_outputs, [image.size[1],  image.size[0]], 10, 0.3, 0.5)

Then, the next step is yolo_head, which can be seen at “./yad2k/models/keras_yolo.py”. So, what you want is in there.

Here is an overview of object detection/localization steps by Yolo.
Please also see this link that is written by ai_curious for anchor box related operations.

The output from the network includes all candidate boxes. As you see, an image is split into 19x19 grid. And, each has 5 anchor blocks. (The center of each anchor box is inside a grid selected.) And, each anchor block information has box (4 (position related info) + 1 (confidence) + 80 (probability distribution)) length.
Yolo head extracts those information from the network output (19x19x425), and creates a list of 4 tuples, i.e, (box_xy, box_wh, box_confidence, and box_class_prob).
Then, Yolo Eval, that you wrote, works on filtering and non_max_suppression to get the final boxes with class information.

I think the above covers your question. Hope this helps.