Error Running TensorFlow Model

stevediaz · April 24, 2024, 8:54am

Hello

I’m facing an issue while running a TensorFlow model and need some advice in troubleshooting.

When trying to train my TensorFlow model, I got the following error message: “ResourceExhaustedError: OOM when allocating tensor with shape [batch_size, num_features] and type float on GPU.”

I’m using TensorFlow version 2.5.0 and running the model on a GPU-enabled machine with sufficient memory. Despite all ; I’m facing out-of-memory errors during training, which prevents the model from completing successfully.

I’ve tried reducing the batch size and optimizing the model architecture to reduce memory usage. I’ve also monitored GPU memory usage during training and confirmed that there’s available memory before the error occurs. I’ve checked for any memory leaks in my code but haven’t found any issues.

I’m asking help on how to further resolve this out-of-memory error in TensorFlow. Any suggestions would be greatly appreciated.

Thank you for your help!

Thank you
stevediaz

Deepti_Prasad · April 24, 2024, 10:07am

Kindly also provide link to your codes, so one can know why you are facing this issue with your codes. If this is about not wanting to save tensorflow log information and error information then look for silence.tensorflow.

TMosh · April 24, 2024, 4:59pm

It’s possible your code is missing some key elements for memory usage and control.

Or maybe your code is eating some other memory besides that in the GPU.

However:

TensorFlow 2.5 is almost 3 years old.

You might scroll through the release notes for newer versions, and see if there were any fixes that addressed memory usage issues.

https://www.tensorflow.org/versions

stevediaz · April 25, 2024, 10:06am

thanks for sharing …helpful

Topic		Replies	Views
TensorFlow OOM Error Advanced Computer Vision with TensorFlow week-module-1	1	736	November 21, 2021
Getting OOM error Generative Deep Learning with TensorFlow week-module-3	4	653	April 29, 2022
OOM when allocating tensor solved with smaller batch size locally Advanced Computer Vision with TensorFlow week-module-1	4	476	August 5, 2023
How do you handle memory management issues in TensorFlow? AI For Everyone Resources	5	491	May 1, 2024
C3W1 Assignment: Grader ran out of memory while grading the submission TensorFlow: Data and Deployment Resources	1	726	May 18, 2023

Error Running TensorFlow Model

Related topics