C1_W1_Assignment- Model does not learn on personal PC with GPU

vivdon · February 28, 2025, 5:52am

Hello Everyone,

I’ve encountered several issues while working on the assignment locally, and I would appreciate some guidance. Here’s a summary of my experience:

1. Problems Encountered

Outdated Code:

I adapted the assignment to TensorFlow v2.16.1 using tf.data.Dataset instead of deprecated methods.

Bug in Code:

I resolved a bug related to dataset loading (details here: link).

Model Not Learning:

Despite my efforts, the model’s training accuracy never goes above ~0.47, even after 100 epochs.
Validation accuracy remains similarly low.

2. What I Tried

Updated TensorFlow code:

Rewrote the assignment using model.fit() instead of model.fit_generator().

Used Callbacks:

callbacks = [
    tf.keras.callbacks.TensorBoard(log_dir=LOG_DIR),
    tf.keras.callbacks.ModelCheckpoint(filepath=checkpoint_prefix,
                                       save_weights_only=True,
                                       save_best_only=True,
                                       monitor='val_loss',
                                       mode='min'),
    tf.keras.callbacks.EarlyStopping(monitor='val_loss', patience=10, restore_best_weights=True),
    tf.keras.callbacks.ReduceLROnPlateau(monitor='val_loss', factor=0.1, patience=5, min_lr=0.00001)
]

Ran Original Code:

I ran the original assignment code (with minor updates), but training accuracy still stagnates at ~0.47.

Used different Preprocessing:

Used tf.keras.applications.densenet.preprocess_input for image preprocessing.

Increased Epochs:

Trained for 100 epochs but saw no significant improvement in accuracy.

3. Expected vs Actual Behavior

Expected: Based on my prior experience with transfer learning, I expected training accuracy to reach ~0.70 within the first few epochs or even start overfitting.
Actual: Training accuracy remains stagnant at ~0.47, and validation accuracy does not improve significantly either.
These are pictures with recent ran upto 60 epochs.

acc_curve576×455 42 KB

loss_curve576×455 24.2 KB

4. Questions

Could there be an issue with my preprocessing pipeline or dataset preparation?
Are there any additional steps I should take to debug why the model is not learning?
Has anyone else faced similar issues with this assignment?

I’ve spent several days trying to debug this issue and would greatly appreciate any insights or suggestions.

Thank you in advance!
Best regards,
Vivek Patel

TMosh · February 28, 2025, 6:18am

The easiest path to follow is to install exactly the same versions of the tools that Coursera provides.

Then, once you get that working locally, you can experiment with updating various pieces to cope with all the recent tool updates. That’s an endless treadmill.

vivdon · February 28, 2025, 7:37am

Hello TMosh,
thank you for you reply. but they did not updated the code since few years (i guess). It uses tensorflow version 1.15.0. I could not install that much older version that easily. I have to build docker container or something. did i enrolled in older version of the course ??? if not then how do ask them or create ticket to update the course ? this is just first assignment of my full course (AI for Medicine) with sub courses.

but just for the information, I have compared the output step by step. is it possible for you to look into notebook ? it am sure, it will be helpful for others too.

BR,
Vivek

TMosh · February 28, 2025, 8:11am

Based on past practice, updating the course is extremely unlikely. It’s intended to be run on Coursera’s platform.

Sorry, I have not studied that course’s material.

Deepti_Prasad · February 28, 2025, 8:55am

This feedback was already provided to the l.t. of course for which I had got response that there won’t be immediate update of the course as it was done 2-3 years back. this was response when I had asked that course needs update.

if you want to practice similar assignment like the one here, kaggle is a place you can explore for now.

As mentor mentioned the version needs to match with version you are working locally for module is only way go, but being said that tensorflow some of version deprecated. So even working through that you might have set back.

the best step in case you want to work on the same assignment would be to work upon yourself on codes.

Also just a pointer Vivek, from what I remember when I had done the course, training of these models were not to the par, if you see your expected training accuracy was 0.70. So the data is ofcourse in the main course also not getting above 80 or 90 percent of accuracy.

When I had checked the metadata, I felt model was running on less number of images, so probably the model could hardly find much feature selection importance on the data, model was training.

The main idea behind the assignment was to understand how the codes needs to be planned on, which it did.

Hope this helps!

vivdon · February 28, 2025, 9:16am

Hello Deepti,

thank you for your reply.

yes. i can understand. but i am thinking either there is a huge bug in the course or i am doing something wrong, which i can not figure out. I have followed all the exact steps from assignment and verify it with the result from the assignment.

but my model is noway where near to overfit. it is simply not learning. I never had such a issue before.

I passed the assignment. but i am not happy with replicating the same thing. if it is issue from the assignment then it must be corrected (it is fine, if they do not update course). from the previous bug like this link. It seems that they never run the this script to train the model, otherwise they would have caught it just by running model.fit_generator().

if you have time, then it would be nice, if you could simply look over the assignment or my code. maybe you will easily detect the issue.

BR,
Vivek

Deepti_Prasad · February 28, 2025, 12:46pm

the link you shared the bug was not in the assignment but the learner who posted query had written an incorrect code causing the type error.

No bugs are left in the assignment when one run in coursera environment.

Topic		Replies	Views
Accuracy is Not going to 98% Introduction to TF for Artificial Intelligence ... week-2	2	13	January 25, 2025
My accuracy is to low? Introduction to TF for Artificial Intelligence ... week-2	3	539	March 18, 2022
Anyone else managed to train the model locally from W1's assignment? AI for Medical Diagnosis week-1	7	598	September 7, 2021
C1_W1_Assignment Test accuracy Browser-based Models with TensorFlow.js week-1	5	456	January 7, 2025
C2_w1 assignment accuracy Device-based Models with TensorFlow Lite week-1	10	644	November 22, 2022

C1_W1_Assignment- Model does not learn on personal PC with GPU

1. Problems Encountered

2. What I Tried

Related topics