Error Running TensorFlow Model

Hello

I’m facing an issue while running a TensorFlow model and need some advice in troubleshooting.

When trying to train my TensorFlow model, I got the following error message: “ResourceExhaustedError: OOM when allocating tensor with shape [batch_size, num_features] and type float on GPU.”

I’m using TensorFlow version 2.5.0 and running the model on a GPU-enabled machine with sufficient memory. Despite all ; I’m facing out-of-memory errors during training, which prevents the model from completing successfully.

I’ve tried reducing the batch size and optimizing the model architecture to reduce memory usage. I’ve also monitored GPU memory usage during training and confirmed that there’s available memory before the error occurs. I’ve checked for any memory leaks in my code but haven’t found any issues.

I’m asking help on how to further resolve this out-of-memory error in TensorFlow. Any suggestions would be greatly appreciated.

Thank you for your help!

Thank you
stevediaz

Kindly also provide link to your codes, so one can know why you are facing this issue with your codes. If this is about not wanting to save tensorflow log information and error information then look for silence.tensorflow.

It’s possible your code is missing some key elements for memory usage and control.

Or maybe your code is eating some other memory besides that in the GPU.

However:

TensorFlow 2.5 is almost 3 years old.

You might scroll through the release notes for newer versions, and see if there were any fixes that addressed memory usage issues.

https://www.tensorflow.org/versions

thanks for sharing …helpful