CC3_W1_Assignment birds.h5 issue

Hafiz_Uzair · November 18, 2023, 1:46pm

Hi everyone, I have been facing this issue while submitting the birds. h5 file. I have re-trained my model multiple times and downloaded the h5 file multiple times from the colab session. But when I upload the file the grader would give me 0/100 with the grader output “Your model could not be loaded. Make sure it is a valid h5 file.” Every single time. I have tried 7 to 8 times, but it wouldn’t work for some reason. Can anyone help me to resolve this issue, i’ll really appreciate that.

Wendy · November 21, 2023, 1:40am

Hi @Hafiz_Uzair,
We’ve seen this error in the past when there is a mismatch between the tensorflow version used to create the birds.h5 file and the version used by the grader. Is it possible you have an old version of the assignment? The current version has a cell in section 0.5 Imports that looks like this:

# Install packages for compatibility with the autograder
!pip install tensorflow==2.8.0
!pip install keras==2.8.0

This will set the tensorflow version to one that works with the grader. If you’re not using this version of the assignment, please switch to the current version, or try copying these lines into your code.

One other issue that can cause this error is if you use the experimental version of SDG in exercise 5. In the current version of the code, there is a comment to explain this:

When using SGD, set the momentum to 0.9 and keep the default learning rate. (Note: To avoid grading issues, please use tf.keras.optimizers.SGD instead of tf.keras.optimizers.experimental.SGD. We will remove this note once the grader has been updated to recognize the experimental module.).

Hopefully, one of these will be the issue you are seeing.

Hafiz_Uzair · November 21, 2023, 10:42am

Yeah, I figured it out. It was due to the tensorflow version. However, I really appreciate your response. Thank you, Cheers!!

Habibi_Ahmadi_Muslim · December 18, 2023, 3:27am

I am already using Tf and Keras version 2.8.0. However, when running all notebooks, it looks as if my GPU RAM is not being used at all because it doesn’t show an increase, I’m sure the problem lies in the Cuda and Cudnn versions of Colab which are no longer compatible with the TensorFlow and Keras 2.8.0 versions. If it’s like this, what do you suggest?

Habibi_Ahmadi_Muslim · December 18, 2023, 3:28am

The training process took a very long time, approximately 10 hours

Jorge_Murria · December 26, 2023, 7:53am

I’m having the same problem. Colab is configured as GPU but it seems just CPU is been used.
Have you been able to fix it?
Thanks!

Jorge_Murria · December 26, 2023, 8:09am

I have just confirmed. If I do not run this cell:

# Install packages for compatibility with the autograder
!pip install tensorflow==2.8.0
!pip install keras==2.8.0

The training performs at a reasonable speed:
Epoch 1/50
94/94 [==============================] - 72s 531ms/step - loss: 0.0627 - mse: 0.0627 - val_loss: 0.2929 - val_mse: 0.2929
Epoch 2/50
94/94 [==============================] - 47s 502ms/step - loss: 0.0117 - mse: 0.0117 - val_loss: 0.2158 - val_mse: 0.2158
Epoch 3/50
94/94 [==============================] - 48s 509ms/step - loss: 0.0082 - mse: 0.0082 - val_loss: 0.1437 - val_mse: 0.1437

Jorge_Murria · December 26, 2023, 9:26am

Hi @Wendy ,
After several tests, what I’ve found is that if I do not run the lines:

# Install packages for compatibility with the autograder
!pip install tensorflow==2.8.0
!pip install keras==2.8.0

I can fit the model with a reasonable time for each epoch, the model is properly trained, but if I submit the birds.h5 file, I receive the message “Your model could not be loaded. Make sure it is a valid h5 file.”
Epoch 1/50
94/94 [===] - 72s 531ms/step - loss: 0.0627 - mse: 0.0627
Epoch 2/50
94/94 [===] - 47s 502ms/step - loss: 0.0117 - mse: 0.0117
Epoch 3/50
94/94 [===] - 48s 509ms/step - loss: 0.0082 - mse: 0.0082

16/16 [==============================] - 3s 90ms/step
Number of predictions where iou > threshold(0.5): 280
Number of predictions where iou < threshold(0.5): 220

But if I run the cells and downgrade the tensorflow version, the GPU is not used and instead of ~60sec, it takes ~1200secs per epoch and a timeOut in Colab raises before finishing the training, so I cannot generate the birds.h5 file and I cannot submit it for evaluation.

I have prepared the code, trained my model, confirmed with a 56% of images with iou score greater or equal to 0.5, but I cannot pass the assignment…

What should I have to do now?

Yan_Du · December 28, 2023, 6:30am

I got the exactly same problem. Please let me know if there is a solution. Thanks!!

Hafiz_Uzair · December 28, 2023, 10:42am

I’m not sure what the problem you are facing exactly, but if you are reaching the GPU limit before the training is completed you can use a distributed strategy (mirrored strategy). By doing this, your GPU would never reach its limit since it will train your data in parallel pipelines. if you still find any problem you can ask freely.

Note: Make sure you first install the 2.8.0 version of TensorFlow, because the grader is compatible with only this version

Jorge_Murria · December 28, 2023, 10:59am

Hi @Hafiz_Uzair , thanks for your reply.
The problem I’m having (and with C3W3 assignment too) is that if I install the 2.8.0 version, the GPU is not used.
With all the same code but not installing the old version, the GPU is used, so I think there is a problem with that version and the current version of Colab.

Habibi_Ahmadi_Muslim · December 28, 2023, 12:08pm

Hi @Jorge_Murria, I’ve previously encountered the same issue. I suggest using TensorFlow 2.15 initially, then, after saving the model, retrain it using TensorFlow 2.8. This can be done by setting trainable=True for the last layer (layer[-1:]) and freezing the other layers. This workaround is needed because Colab notebooks no longer support GPU acceleration with TensorFlow 2.8. This strategy should help reduce training time. Additionally, for local development, consider installing CUDA 11.2 and cuDNN 8.1 and running TensorFlow version 2.8.

Hope this helps!

Jorge_Murria · December 28, 2023, 12:36pm

Hi @Habibi_Ahmadi_Muslim I will try it. It seems a good workaround while the grader is updated to be able to work with the last tensorflow version.
Thanks!

Jorge_Murria · December 28, 2023, 1:26pm

Hi Habibi, I have followed your indications but the message of the grader is the same:
Your model could not be loaded. Make sure it is a valid h5 file.

But the model is working properly:

Number of predictions where iou > threshold(0.5): 281
Number of predictions where iou < threshold(0.5): 219

What else can I do?

Habibi_Ahmadi_Muslim · December 28, 2023, 2:18pm

Hello, I have successfully solved this assignment, and I did it following these steps:

If you are working on a local device, please install the following packages:
!pip install tensorflow==2.8.3
!pip install keras==2.8.0
!pip install tensorflow_datasets==4.9.2
!pip install protobuf==3.20.*
!pip install numpy==1.23.5
!pip install Pillow==9.4.0
!pip install matplotlib==3.7.1
!pip install tqdm==4.66.1
!pip install pandas==1.5.3
!pip install scipy==1.11.4
!pip install tensorflow-hub==0.12.0
!pip install opencv-python==4.8.*
Adjust the bash kernel according to your operating system. If you are using Windows, run the bash kernel manually in PowerShell to update incompatible packages simultaneously.
Initialize a new model using tf.keras.models.load_model("your_model_path", compile=False)
Continue by reinitializing model.compile and other to adapt to the assessment’s rules.

I hope this helps!

Jorge_Murria · December 28, 2023, 2:28pm

Hi @Habibi_Ahmadi_Muslim,
Thanks for your help, but I haven’t been able to install tensorflow in local.
Just working in Colab.

Habibi_Ahmadi_Muslim · December 28, 2023, 2:37pm

Okay, you can ignore steps 1 and 2. Please start from step 3. One thing I forgot to explain earlier is that after running the kernel to prepare the dataset, please immediately load the model that you trained and saved previously

Habibi_Ahmadi_Muslim · December 28, 2023, 2:42pm

I provide the following reference for configuring the model.h5 after loading it:
model. trainable = False
NUM_LAYERS = 1
for layer in model.layers[-NUM_LAYERS:]:
layer. trainable = True

Hafiz_Uzair · December 28, 2023, 4:01pm

Hi @Habibi_Ahmadi_Muslim , did you solve the zombie detector problem? week 2 graded assignment

Habibi_Ahmadi_Muslim · December 28, 2023, 5:09pm

Hi @Hafiz_Uzair , I have successfully completed all assignments from C3 and didn’t encounter any major issues, except for assignments W1 and W4

Topic		Replies	Views
Problem with submitting assignment 1 Advanced Computer Vision with TensorFlow week-1	12	700	April 21, 2023
.H5 file error, when submitted the answer Generative Deep Learning with TensorFlow week-2	12	639	September 30, 2023
Week 3 Assignment: Image Segmentation of Handwritten Digits: Submission Issue Advanced Computer Vision with TensorFlow week-3	3	172	June 12, 2024
Advanced-computer-vision-with-tensorflow C3W1 Assignment Advanced Computer Vision with TensorFlow week-1	15	337	June 8, 2024
Model could not be loaded Advanced Computer Vision with TensorFlow week-1	7	691	January 4, 2023

CC3_W1_Assignment birds.h5 issue

Related topics