C1_W1_Assignment- Model does not learn on personal PC with GPU

Hello Everyone,

I’ve encountered several issues while working on the assignment locally, and I would appreciate some guidance. Here’s a summary of my experience:

1. Problems Encountered

  1. Outdated Code:
  • I adapted the assignment to TensorFlow v2.16.1 using tf.data.Dataset instead of deprecated methods.
  1. Bug in Code:
  • I resolved a bug related to dataset loading (details here: link).
  1. Model Not Learning:
  • Despite my efforts, the model’s training accuracy never goes above ~0.47, even after 100 epochs.
  • Validation accuracy remains similarly low.

2. What I Tried

  1. Updated TensorFlow code:
  • Rewrote the assignment using model.fit() instead of model.fit_generator().
  1. Used Callbacks:
callbacks = [
    tf.keras.callbacks.TensorBoard(log_dir=LOG_DIR),
    tf.keras.callbacks.ModelCheckpoint(filepath=checkpoint_prefix,
                                       save_weights_only=True,
                                       save_best_only=True,
                                       monitor='val_loss',
                                       mode='min'),
    tf.keras.callbacks.EarlyStopping(monitor='val_loss', patience=10, restore_best_weights=True),
    tf.keras.callbacks.ReduceLROnPlateau(monitor='val_loss', factor=0.1, patience=5, min_lr=0.00001)
]
  1. Ran Original Code:
  • I ran the original assignment code (with minor updates), but training accuracy still stagnates at ~0.47.
  1. Used different Preprocessing:
  • Used tf.keras.applications.densenet.preprocess_input for image preprocessing.
  1. Increased Epochs:
  • Trained for 100 epochs but saw no significant improvement in accuracy.

3. Expected vs Actual Behavior

  • Expected: Based on my prior experience with transfer learning, I expected training accuracy to reach ~0.70 within the first few epochs or even start overfitting.
  • Actual: Training accuracy remains stagnant at ~0.47, and validation accuracy does not improve significantly either.
  • These are pictures with recent ran upto 60 epochs.

4. Questions

  • Could there be an issue with my preprocessing pipeline or dataset preparation?
  • Are there any additional steps I should take to debug why the model is not learning?
  • Has anyone else faced similar issues with this assignment?

I’ve spent several days trying to debug this issue and would greatly appreciate any insights or suggestions.

Thank you in advance!
Best regards,
Vivek Patel

The easiest path to follow is to install exactly the same versions of the tools that Coursera provides.

Then, once you get that working locally, you can experiment with updating various pieces to cope with all the recent tool updates. That’s an endless treadmill.

1 Like

Hello TMosh,
thank you for you reply. but they did not updated the code since few years (i guess). It uses tensorflow version 1.15.0. I could not install that much older version that easily. I have to build docker container or something. did i enrolled in older version of the course ??? if not then how do ask them or create ticket to update the course ? this is just first assignment of my full course (AI for Medicine) with sub courses.

but just for the information, I have compared the output step by step. is it possible for you to look into notebook ? it am sure, it will be helpful for others too.

BR,
Vivek

Based on past practice, updating the course is extremely unlikely. It’s intended to be run on Coursera’s platform.

Sorry, I have not studied that course’s material.

This feedback was already provided to the l.t. of course for which I had got response that there won’t be immediate update of the course as it was done 2-3 years back. this was response when I had asked that course needs update.

if you want to practice similar assignment like the one here, kaggle is a place you can explore for now.

As mentor mentioned the version needs to match with version you are working locally for module is only way go, but being said that tensorflow some of version deprecated. So even working through that you might have set back.

the best step in case you want to work on the same assignment would be to work upon yourself on codes.

Also just a pointer Vivek, from what I remember when I had done the course, training of these models were not to the par, if you see your expected training accuracy was 0.70. So the data is ofcourse in the main course also not getting above 80 or 90 percent of accuracy.

When I had checked the metadata, I felt model was running on less number of images, so probably the model could hardly find much feature selection importance on the data, model was training.

The main idea behind the assignment was to understand how the codes needs to be planned on, which it did.

Hope this helps!

1 Like

Hello Deepti,

thank you for your reply.

yes. i can understand. but i am thinking either there is a huge bug in the course or i am doing something wrong, which i can not figure out. I have followed all the exact steps from assignment and verify it with the result from the assignment.

but my model is noway where near to overfit. it is simply not learning. I never had such a issue before.

I passed the assignment. but i am not happy with replicating the same thing. if it is issue from the assignment then it must be corrected (it is fine, if they do not update course). from the previous bug like this link. It seems that they never run the this script to train the model, otherwise they would have caught it just by running model.fit_generator().

if you have time, then it would be nice, if you could simply look over the assignment or my code. maybe you will easily detect the issue.

BR,
Vivek

the link you shared the bug was not in the assignment but the learner who posted query had written an incorrect code causing the type error.

No bugs are left in the assignment when one run in coursera environment.