Lesson Learned: model.fit(...shuffle=True...)

ai_curious · September 28, 2021, 4:18pm

One of the things I have been doing to try to understand why my YOLO implementation fails to learn the way I would like is to trace individual records from ground truth through the loss function to specific loss calculations, such as the coordinates loss. This is nearly impossible to do with the default setting for the model.fit() shuffle=True. With shuffle=True the order of the records is rearranged every call to the loss function. There may be good reasons to shuffle when really training for real for real, but I can’t recommend it while you are in early or deep debugging on a complex loss function and need to understand what is happening to each record, such as why the coordinates loss is not trending the way you might expect. Save yourself some headscratching and consider setting it to False until you know it is working as expected.

tf.keras.Model.fit()

jonaslalin · September 28, 2021, 5:58pm

I usually only work with tf.data.Datasets which I shuffle deterministically while debugging:

tf.random.set_seed(1234)

raw_sv_en = raw_sv_en_filtered.shuffle(
    tf.cast(RAW_SV_EN_FILTERED_SIZE, tf.int64),
    seed=1234,
    reshuffle_each_iteration=False
)
raw_sv_en = raw_sv_en.cache()

ai_curious · September 28, 2021, 6:17pm

My initial thought was to do things as simply as possible until I was confident in my implementation and understanding of the loss function, and then scale up to a much larger data set for proper training. I am not convinced it was the right approach, as YOLO is proving quite difficult to train with small and sparse data.

I took a little detour and got my Apple M1 set up to run tensorflow natively instead of via emulator - what a pain. Also reduced both my image size 608->416 and grid size 19->13 and wrote an augmenter to move my crop window all around the original image. Maybe now time to combine all these steps with the full data set and use the tf builtin support for truly large data.

jonaslalin · September 28, 2021, 6:30pm

Yes, you most likely want to setup an optimized pipeline to benefit from both cpu and gpu processing. I develop my spare time models in Google Colab. Usually I get Tesla K80s, but if you are lucky, you will get a Tesla T4 I know Apple M1 are amazing for deep learning, but they don’t beat a Tesla T4 GPU

ai_curious · September 28, 2021, 9:12pm

I think part of your sentence got chopped off. I’m sure it originally read …for a chip that fits into an ultralight laptop, generates very little heat, and has extraordinarily good battery life, Apple M1 are amazing for deep learning … with which I would generally agree. I have trained models while on the 2+ hour drive to visit my kids, which at this point in my life is more useful than true computing horsepower.

jonaslalin · September 29, 2021, 6:06am

You are right. It was chopped of Sounds great!

Topic		Replies	Views
In vscode happy_model.evaluate(X_test, Y_test) gets a different result each time Convolutional Neural Networks coursera-platform	1	479	June 12, 2022
Implementing model Introduction to TF for Artificial Intelligence ... week-module-2	5	566	November 4, 2021
Can't get to shuffle the files Convolutional Neural Networks in TensorFlow week-module-1	3	529	August 17, 2022
I have a problem with my model for 5days.. Please help me out Convolutional Neural Networks in TensorFlow week-module-2	2	627	April 6, 2022
Varying loss and accuracy _Convolution_model_Application_ Week1 Convolutional Neural Networks coursera-platform	2	524	December 30, 2021

Lesson Learned: model.fit(...shuffle=True...)

Related topics