Transfer learning with YOLO

I’m looking to modify and fine-tuning YOLO for my 2-class object detection. Since the whole NN has more than 50M trainable parameters, which layer is the most choice to unfreeze and train with my data set? what is the best way to reduce output size to 19195*2 (since I have just 2 classes).

I haven’t run this through the compiler to confirm, but my intuition is you could focus on the very last Conv2D layer. When I built my own YOLO v2 model, I specified the last layer like this:

x = Conv2D(NUM_ANCHORS * (4 + 1 + NUM_CLASSES), (1,1),strides=(1,1),padding='same',name='conv_23')(x)
output = Reshape((GRID_H,GRID_W, NUM_ANCHORS, 4 + 1 + NUM_CLASSES), name='yolo_output')(x)

which resulted in this after compilation:

conv_23 (Conv2D)                (None, 19, 19, 48)   49200       dropout[0][0]                    
yolo_output (Reshape)           (None, 19, 19, 8, 6) 0           conv_23[0][0]   

for my configuration of 19x19 grid, 8 anchors, and 1 class.

You might also look at adding one fully connected layer with the output shape you need?

Let us know if this helps

1 Like

Here is another idea, lifted straight from this tensorflow guide page

A last, optional step, is fine-tuning , which consists of unfreezing the entire model you obtained above (or part of it), and re-training it on the new data with a very low learning rate. This can potentially achieve meaningful improvements, by incrementally adapting the pretrained features to the new data.