Hi everyone, I have been facing this issue while submitting the birds. h5 file. I have re-trained my model multiple times and downloaded the h5 file multiple times from the colab session. But when I upload the file the grader would give me 0/100 with the grader output “Your model could not be loaded. Make sure it is a valid h5 file.” Every single time. I have tried 7 to 8 times, but it wouldn’t work for some reason. Can anyone help me to resolve this issue, i’ll really appreciate that.
Hi @Hafiz_Uzair,
We’ve seen this error in the past when there is a mismatch between the tensorflow version used to create the birds.h5 file and the version used by the grader. Is it possible you have an old version of the assignment? The current version has a cell in section 0.5 Imports that looks like this:
# Install packages for compatibility with the autograder
!pip install tensorflow==2.8.0
!pip install keras==2.8.0
This will set the tensorflow version to one that works with the grader. If you’re not using this version of the assignment, please switch to the current version, or try copying these lines into your code.
One other issue that can cause this error is if you use the experimental version of SDG in exercise 5. In the current version of the code, there is a comment to explain this:
When using SGD, set the momentum to 0.9 and keep the default learning rate. (Note: To avoid grading issues, please use tf.keras.optimizers.SGD instead of tf.keras.optimizers.experimental.SGD. We will remove this note once the grader has been updated to recognize the experimental module.).
Hopefully, one of these will be the issue you are seeing.
Yeah, I figured it out. It was due to the tensorflow version. However, I really appreciate your response. Thank you, Cheers!!
I am already using Tf and Keras version 2.8.0. However, when running all notebooks, it looks as if my GPU RAM is not being used at all because it doesn’t show an increase, I’m sure the problem lies in the Cuda and Cudnn versions of Colab which are no longer compatible with the TensorFlow and Keras 2.8.0 versions. If it’s like this, what do you suggest?
The training process took a very long time, approximately 10 hours
I’m having the same problem. Colab is configured as GPU but it seems just CPU is been used.
Have you been able to fix it?
Thanks!
I have just confirmed. If I do not run this cell:
# Install packages for compatibility with the autograder
!pip install tensorflow==2.8.0
!pip install keras==2.8.0
The training performs at a reasonable speed:
Epoch 1/50
94/94 [==============================] - 72s 531ms/step - loss: 0.0627 - mse: 0.0627 - val_loss: 0.2929 - val_mse: 0.2929
Epoch 2/50
94/94 [==============================] - 47s 502ms/step - loss: 0.0117 - mse: 0.0117 - val_loss: 0.2158 - val_mse: 0.2158
Epoch 3/50
94/94 [==============================] - 48s 509ms/step - loss: 0.0082 - mse: 0.0082 - val_loss: 0.1437 - val_mse: 0.1437
Hi @Wendy ,
After several tests, what I’ve found is that if I do not run the lines:
# Install packages for compatibility with the autograder
!pip install tensorflow==2.8.0
!pip install keras==2.8.0
I can fit the model with a reasonable time for each epoch, the model is properly trained, but if I submit the birds.h5 file, I receive the message “Your model could not be loaded. Make sure it is a valid h5 file.”
Epoch 1/50
94/94 [===] - 72s 531ms/step - loss: 0.0627 - mse: 0.0627
Epoch 2/50
94/94 [===] - 47s 502ms/step - loss: 0.0117 - mse: 0.0117
Epoch 3/50
94/94 [===] - 48s 509ms/step - loss: 0.0082 - mse: 0.0082
16/16 [==============================] - 3s 90ms/step
Number of predictions where iou > threshold(0.5): 280
Number of predictions where iou < threshold(0.5): 220
But if I run the cells and downgrade the tensorflow version, the GPU is not used and instead of ~60sec, it takes ~1200secs per epoch and a timeOut in Colab raises before finishing the training, so I cannot generate the birds.h5 file and I cannot submit it for evaluation.
I have prepared the code, trained my model, confirmed with a 56% of images with iou score greater or equal to 0.5, but I cannot pass the assignment…
What should I have to do now?
I got the exactly same problem. Please let me know if there is a solution. Thanks!!
I’m not sure what the problem you are facing exactly, but if you are reaching the GPU limit before the training is completed you can use a distributed strategy (mirrored strategy). By doing this, your GPU would never reach its limit since it will train your data in parallel pipelines. if you still find any problem you can ask freely.
Note: Make sure you first install the 2.8.0 version of TensorFlow, because the grader is compatible with only this version
Hi @Hafiz_Uzair , thanks for your reply.
The problem I’m having (and with C3W3 assignment too) is that if I install the 2.8.0 version, the GPU is not used.
With all the same code but not installing the old version, the GPU is used, so I think there is a problem with that version and the current version of Colab.
Hi @Jorge_Murria, I’ve previously encountered the same issue. I suggest using TensorFlow 2.15 initially, then, after saving the model, retrain it using TensorFlow 2.8. This can be done by setting trainable=True for the last layer (layer[-1:]) and freezing the other layers. This workaround is needed because Colab notebooks no longer support GPU acceleration with TensorFlow 2.8. This strategy should help reduce training time. Additionally, for local development, consider installing CUDA 11.2 and cuDNN 8.1 and running TensorFlow version 2.8.
Hope this helps!
Hi @Habibi_Ahmadi_Muslim I will try it. It seems a good workaround while the grader is updated to be able to work with the last tensorflow version.
Thanks!
Hi Habibi, I have followed your indications but the message of the grader is the same:
Your model could not be loaded. Make sure it is a valid h5 file.
But the model is working properly:
Number of predictions where iou > threshold(0.5): 281
Number of predictions where iou < threshold(0.5): 219
What else can I do?
Hello, I have successfully solved this assignment, and I did it following these steps:
-
If you are working on a local device, please install the following packages:
!pip install tensorflow==2.8.3
!pip install keras==2.8.0
!pip install tensorflow_datasets==4.9.2
!pip install protobuf==3.20.*
!pip install numpy==1.23.5
!pip install Pillow==9.4.0
!pip install matplotlib==3.7.1
!pip install tqdm==4.66.1
!pip install pandas==1.5.3
!pip install scipy==1.11.4
!pip install tensorflow-hub==0.12.0
!pip install opencv-python==4.8.* -
Adjust the bash kernel according to your operating system. If you are using Windows, run the bash kernel manually in PowerShell to update incompatible packages simultaneously.
-
Initialize a new model using
tf.keras.models.load_model("your_model_path", compile=False)
-
Continue by reinitializing
model.compile
and other to adapt to the assessment’s rules.
I hope this helps!
Hi @Habibi_Ahmadi_Muslim,
Thanks for your help, but I haven’t been able to install tensorflow in local.
Just working in Colab.
Okay, you can ignore steps 1 and 2. Please start from step 3. One thing I forgot to explain earlier is that after running the kernel to prepare the dataset, please immediately load the model that you trained and saved previously
I provide the following reference for configuring the model.h5
after loading it:
model. trainable = False
NUM_LAYERS = 1
for layer in model.layers[-NUM_LAYERS:]:
layer. trainable = True
Hi @Habibi_Ahmadi_Muslim , did you solve the zombie detector problem? week 2 graded assignment
Hi @Hafiz_Uzair , I have successfully completed all assignments from C3 and didn’t encounter any major issues, except for assignments W1 and W4