Model not working on local machine

Ho_Young_Park · January 31, 2022, 7:09am

When I ran the code below to train the model with the processed data, following error occurred.

base_dir = HOME_DIR + “processed/”

with open(base_dir + “config.json”) as json_file:
config = json.load(json_file)

Get generators for training and validation sets

train_generator = util.VolumeDataGenerator(config[“train”], base_dir + “train/”, batch_size=3, dim=(160, 160, 16), verbose=0)
valid_generator = util.VolumeDataGenerator(config[“valid”], base_dir + “valid/”, batch_size=3, dim=(160, 160, 16), verbose=0)

steps_per_epoch = 20
n_epochs=10
validation_steps = 20

model.fit_generator(generator=train_generator,
steps_per_epoch=steps_per_epoch,
epochs=n_epochs,
use_multiprocessing=True,
validation_data=valid_generator,
validation_steps=validation_steps)

INVALID_ARGUMENT: Conv3DBackpropInputOpV2 only supports NDHWC on the CPU.

I am currently using apple m1. please help

ai_curious · January 31, 2022, 11:40am

It has to do with memory architecture and performance optimization in TensorFlow. If you really want to go deep, try this: Convolutional Layers User's Guide - NVIDIA Docs

Did you install the Apple-specific TensorFlow? Tried running on the M1 GPU instead?

Ho_Young_Park · February 4, 2022, 9:22am

Thank you for the suggestion.
Yes. I installed the Apple specific Tensorflow using Miniforge 3.
Currently, I am using Tensorflow version 2.7.0 and Python version 3.8.12.

I’ve checked the GPU is activated.
I forced the model was running on M1 GPU with the following codes (I am not sure this is correct cods).

tf.debugging.set_log_device_placement(True)
with tf.device(’/GPU:0’):
base_dir = HOME_DIR + “processed/”

with open(base_dir + "config.json") as json_file:
    config = json.load(json_file)



model.fit_generator(generator=train_generator,
        steps_per_epoch=steps_per_epoch,
        epochs=n_epochs,
        use_multiprocessing=True,
        validation_data=valid_generator,
        validation_steps=validation_steps)

However, some commands are still running on CPU and the problem of NDHWC is not solved.

Executing op ReadVariableOp in device /job:localhost/replica:0/task:0/device:GPU:0 Executing op Identity in device /job:localhost/replica:0/task:0/device:GPU:0 Executing op __inference_train_function_6275 in device /job:localhost/replica:0/task:0/device:GPU:0

2022-01-31 22:42:09.330428: I tensorflow/core/common_runtime/placer.cc:114] args_0: (_Arg): /job:localhost/replica:0/task:0/device:CPU:0 2022-01-31 22:42:09.330439: I tensorflow/core/common_runtime/placer.cc:114] GeneratorDataset: (GeneratorDataset): /job:localhost/replica:0/task:0/device:CPU:0
INVALID_ARGUMENT: Conv3DBackpropInputOpV2 only supports NDHWC on the CPU.

ai_curious · February 4, 2022, 12:14pm

Interesting, I found some threads out on the interweb that suggest some developers are forcing code to use the CPU instead of GPU for specific performance reasons. For example, https://github.com/mauriceqch/pcc_geo_cnn/issues/4

Have you tried running the whole thing on CPU only? Maybe running slower is preferable to not running at all?

Shay · February 12, 2022, 4:01am

Hey I had same issue here, have you solved the issue?

Ho_Young_Park · February 13, 2022, 11:14am

No. Please let me know if you solve the issue.

Topic		Replies	Views
How to run the whole program on GPU AI for Medical Diagnosis week-module-3	1	606	March 4, 2022
C1_W3_Assignment 4.1 Training on a a Large Dataset AI for Medical Diagnosis week-module-3	7	391	December 6, 2023
Error running C5W3 neural machine translation locally Sequence Models coursera-platform	3	645	July 3, 2021
Error in embedding layer on M1 Mac Natural Language Processing in TensorFlow week-module-2 , week-module-3 , week-module-4	2	558	September 19, 2022
Anyone else managed to train the model locally from W1's assignment? AI for Medical Diagnosis week-module-1	7	598	September 7, 2021

Model not working on local machine

Get generators for training and validation sets

Related topics