Lesson Learned: model.fit(...shuffle=True...)

One of the things I have been doing to try to understand why my YOLO implementation fails to learn the way I would like is to trace individual records from ground truth through the loss function to specific loss calculations, such as the coordinates loss. This is nearly impossible to do with the default setting for the model.fit() shuffle=True. With shuffle=True the order of the records is rearranged every call to the loss function. There may be good reasons to shuffle when really training for real for real, but I can’t recommend it while you are in early or deep debugging on a complex loss function and need to understand what is happening to each record, such as why the coordinates loss is not trending the way you might expect. Save yourself some headscratching and consider setting it to False until you know it is working as expected.

tf.keras.Model.fit()

1 Like

I usually only work with tf.data.Datasets which I shuffle deterministically while debugging:

tf.random.set_seed(1234)

raw_sv_en = raw_sv_en_filtered.shuffle(
    tf.cast(RAW_SV_EN_FILTERED_SIZE, tf.int64),
    seed=1234,
    reshuffle_each_iteration=False
)
raw_sv_en = raw_sv_en.cache()
1 Like

My initial thought was to do things as simply as possible until I was confident in my implementation and understanding of the loss function, and then scale up to a much larger data set for proper training. I am not convinced it was the right approach, as YOLO is proving quite difficult to train with small and sparse data.

I took a little detour and got my Apple M1 set up to run tensorflow natively instead of via emulator - what a pain. Also reduced both my image size 608->416 and grid size 19->13 and wrote an augmenter to move my crop window all around the original image. Maybe now time to combine all these steps with the full data set and use the tf builtin support for truly large data.

1 Like

Yes, you most likely want to setup an optimized pipeline to benefit from both cpu and gpu processing. I develop my spare time models in Google Colab. Usually I get Tesla K80s, but if you are lucky, you will get a Tesla T4 :rocket: I know Apple M1 are amazing for deep learning, but they don’t beat a Tesla T4 GPU :stuck_out_tongue:

I think part of your sentence got chopped off. I’m sure it originally read …for a chip that fits into an ultralight laptop, generates very little heat, and has extraordinarily good battery life, Apple M1 are amazing for deep learning … with which I would generally agree. I have trained models while on the 2+ hour drive to visit my kids, which at this point in my life is more useful than true computing horsepower.

You are right. It was chopped of :grin: Sounds great!