Quick question about Model Checkpoint save_freq

Hi,

I try to understand the load and save model part from tensorflow documentation: Save and load models  |  TensorFlow Core

I don’t understand why the save_freq parameter of ModelCheckpoint is set to 5*batch_size for saving every 5 epochs.
Why is there the batch size variable ?

Is certainly something really simple but I would love to get your insights. thank you

# Include the epoch in the file name (uses `str.format`)
checkpoint_path = "training_2/cp-{epoch:04d}.ckpt"
checkpoint_dir = os.path.dirname(checkpoint_path)

batch_size = 32

# Create a callback that saves the model's weights every 5 epochs
cp_callback = tf.keras.callbacks.ModelCheckpoint(
    filepath=checkpoint_path, 
    verbose=1, 
    save_weights_only=True,
    save_freq=5*batch_size)

# Create a new model instance
model = create_model()

# Save the weights using the `checkpoint_path` format
model.save_weights(checkpoint_path.format(epoch=0))

# Train the model with the new callback
model.fit(train_images, 
          train_labels,
          epochs=50, 
          batch_size=batch_size, 
          callbacks=[cp_callback],
          validation_data=(test_images, test_labels),
          verbose=0)

Please move your post to the right course / week.

It’s not specific to a particular module that’s why I wrote here. Thank

The number of training images is 1000. So, when batch size is 32 images, there’ll be ceil(1000/32) = ceil(31.25) = 32 batches per epoch. So, when save_freq is set to 5 * batch_size, the save operation will happen after every 5 * 32 batches of data which is the same as 5 epochs.

save_freq refers to the number of batches to wait before checkpointing. See this link.

1 Like

Moved your post to General Discussions topic.

Thank you ! Got it ! :slight_smile: