Week 3 assignment 2 - dead kernel

Everything works/passes through “3 - U-Net” (which is where all the coding exercises are). Then at the start of step 4 “Train the Model” (where there nothing for me to code), it starts to train the model and then the kernel dies. Here’s the cell and output. Everything prior has run and passed. I’ve tried “kernel/restart & clear output” and cell/run all, and it gives the exact same problem every time. What else should I try?

EPOCHS = 5
VAL_SUBSPLITS = 5
BUFFER_SIZE = 500
BATCH_SIZE = 32
train_dataset = processed_image_ds.cache().shuffle(BUFFER_SIZE).batch(BATCH_SIZE)
print(processed_image_ds.element_spec)
model_history = unet.fit(train_dataset, epochs=EPOCHS)
(TensorSpec(shape=(96, 128, 3), dtype=tf.float32, name=None), TensorSpec(shape=(96, 128, 1), dtype=tf.uint8, name=None))
Epoch 1/5
3/34 [=>…] - ETA: 2:20 - loss: 2.9826 - accuracy: 0.1996

This issue has been reported by other learners.
Coursera is said to be working on a fix for it.

Hello,

As Tom mentioned that Coursera staff is working on it, meanwhile, you can submit your assignment without running any code. Grader don’t need to see the output. Try submitting your assignment, if you completed all the exercises.

Best,
Saif.

I’ve tried submitting; it says failed. Guessing it’s because the auto-grader requires everything to complete and it doesn’t due to hung kernel? Either way, I’ll move on to week 4 until a fix is ready (I’m not in a particular hurry).

What feedback do you get from the grader? Please post a screen capture image.

Could be it detected some other coding error, not related to the kernel.

The grader only tests your functions, it skips the tasks that seem to make the kernel die.

ex-w3a2

Cell #8. Can’t compile the student’s code. Error: AssertionError(‘Error in test’)

I don’t know what “w3a2” or “cell #8” mean. But I can say for sure that all cells before the markdown block titled “4 - Train the Model” run and pass. And there are more than 8 code cells that run and pass.

I’d guess that “w3a2” refers to Week 3 Assignment 2.
In general, you can’t submit a partially complete notebook for grading.
Did you happen to rename your notebook ipynb file?

The reference to cell #8 means that the error was found in the 8th code cell inside your noteook. You can just count them in the notebook and see which one made the grader unhappy.

No, I didn’t rename the notebook. Yes, I did complete everything before submitted (at least I think I did - hard to know since I can’t run anything after “4” because the kernel hangs:(.

Code cell 8 is a test code cell (not a student code cell) used to validate code cell 7. Here is the code in cell 8 along with it’s output when I run it manually (it runs fine and passes):

Coursera code:
input_size=(96, 128, 3)
n_filters = 32
inputs = Input(input_size)
cblock1 = conv_block(inputs, n_filters * 1)
model1 = tf.keras.Model(inputs=inputs, outputs=cblock1)

output1 = [[‘InputLayer’, [(None, 96, 128, 3)], 0],
[‘Conv2D’, (None, 96, 128, 32), 896, ‘same’, ‘relu’, ‘HeNormal’],
[‘Conv2D’, (None, 96, 128, 32), 9248, ‘same’, ‘relu’, ‘HeNormal’],
[‘MaxPooling2D’, (None, 48, 64, 32), 0, (2, 2)]]

print(‘Block 1:’)
for layer in summary(model1):
print(layer)

comparator(summary(model1), output1)

inputs = Input(input_size)
cblock1 = conv_block(inputs, n_filters * 32, dropout_prob=0.1, max_pooling=True)
model2 = tf.keras.Model(inputs=inputs, outputs=cblock1)

output2 = [[‘InputLayer’, [(None, 96, 128, 3)], 0],
[‘Conv2D’, (None, 96, 128, 1024), 28672, ‘same’, ‘relu’, ‘HeNormal’],
[‘Conv2D’, (None, 96, 128, 1024), 9438208, ‘same’, ‘relu’, ‘HeNormal’],
[‘Dropout’, (None, 96, 128, 1024), 0, 0.1],
[‘MaxPooling2D’, (None, 48, 64, 1024), 0, (2, 2)]]

print(‘\nBlock 2:’)
for layer in summary(model2):
print(layer)

comparator(summary(model2), output2)

Output produced when I run it manually:
Block 1:
[‘InputLayer’, [(None, 96, 128, 3)], 0]
[‘Conv2D’, (None, 96, 128, 32), 896, ‘same’, ‘relu’, ‘HeNormal’]
[‘Conv2D’, (None, 96, 128, 32), 9248, ‘same’, ‘relu’, ‘HeNormal’]
[‘MaxPooling2D’, (None, 48, 64, 32), 0, (2, 2)]
All tests passed!

Block 2:
[‘InputLayer’, [(None, 96, 128, 3)], 0]
[‘Conv2D’, (None, 96, 128, 1024), 28672, ‘same’, ‘relu’, ‘HeNormal’]
[‘Conv2D’, (None, 96, 128, 1024), 9438208, ‘same’, ‘relu’, ‘HeNormal’]
[‘Dropout’, (None, 96, 128, 1024), 0, 0.1]
[‘MaxPooling2D’, (None, 48, 64, 1024), 0, (2, 2)]
All tests passed!

That means the error is in the code in cell 7, but it is only detected when cell 8 runs the code in cell 7.

Here’s the code for cell 7. I think it’s right as the instructions were pretty clear and in compiles and passes all the tests. If you see an error, please LMK:

<code redacted per TMosh’s request>

BTW - thank you for looking at this. I’m not used to getting a real human in this type of scaled out online learning. Thank you!

Please do not share your code on the forum. If a mentor needs to see your code, we’ll ask you to send it via a private message.

The way you have specified the kernel size is not correct. You’ve created a scalar variable that holds the value 3. But what you need is to specify that kernel_size is a named parameter inside the Conv2D() parameter list.

I’m not sure if that’s the whole problem, but it certainly is one issue.

Similarly, you haven’t specified the “filters=” named parameter either.

The named parameters are important, because without the parameter names, you’re relying on the order of the parameters to be exactly correct as they appear in the class definition - including any parameters for which the default value would otherwise be used.

The correct syntax is the same one you’ve used for “activation=…” and “padding=…”.

There is no kernal_size parameter passed that I can use. And the instructions in the preceding mark-down block specifically say to use a kernel size of 3. The parameter list is correct. If you look at the documentation for tf.keras.layers.Conv2D  |  TensorFlow v2.12.0, you will see the the number of filters and kernel_size are the first two parameters, so passing them by order works. The others need to be named as they are not passed in order.

So I think something else is going on.

LMK when I should remove the code or if you want a few more minutes to look at it to see if you can spot something else that might be wrong.

BTW; here are the instructions in the preceding mark-down block for your reference:

Implement conv_block(...). Here are the instructions for each step in the conv_block, or contracting block:

  • Add 2 Conv2D layers with n_filters filters with kernel_size set to 3, kernel_initializer set to ‘he_normal’, padding set to ‘same’ and ‘relu’ activation.
  • if dropout_prob > 0, then add a Dropout layer with parameter dropout_prob
  • If max_pooling is set to True, then add a MaxPooling2D layer with 2x2 pool size

kernel_size is a named parameter.
Use “kernel_size = 3” as one of the parameters to Conv2D().

Tom already highlighted that you need to set kernel_size inside the Conv2D(). All the code already given to you, you just need to remove the None and write the correct value/term there. Don’t need to write any other thing at any other place.

See the below code given to you. Where do you see None? Just replace them with the correct value/term.

### START CODE HERE
conv = Conv2D(None, # Number of filters
              None,   # Kernel size   
              activation=None,
              padding=None,
              kernel_initializer=None)(inputs)
conv = Conv2D(None, # Number of filters
              None,   # Kernel size
              activation=None,
              padding=None,
              kernel_initializer=None)(conv)
### END CODE HERE

Best,
Saif.

Respectfully; it looks like you and Tom may be mistaken. The docs for Conv2D shows that the first two parameters are positional and not keyword - check it out: tf.keras.layers.Conv2D  |  TensorFlow v2.12.0

And just to verify I wasn’t misunderstanding anything (which I doubted since I teach Python, but I do make mistakes, so was happy to try it out), when I changed my code from:
conv = Conv2D(n_filters, 3, …
To what you/Tom suggested:
conv = Conv2D(n_filters, kernel_size=3, …
Then things broken even worse. With my code cell 8 says “all tests pass”, with your/Tom’s suggestion, cell 8 fails with error: TypeError: init() missing 1 required positional argument: ‘kernel_size’

Frankly this was as I expected, since the docs show it is a positional argument and not a keyword argument, and since I called it positionally in previous coding exercises. Anyways, I’m happy to get on a call with you or email thread so we can drill down on this 1-1 instead of in this public forum if you can spare the time? I would certainly appreciate some help getting this solved and keeping this thread marked unresolved until we get to the bottom of this.

Your code was:

### START CODE HERE
kernel_size = 3
conv = Conv2D(...)

In the above code, you defined kernel_size = 3 outside of the Conv2D. This is wrong. We don’t need to define anything outside of Conv2D. You need to do that inside of it. But one caveat is that kernel_size=3 is wrong. You just need to write an integer.

It’s better to see the examples in the Conv2D documentation that you shared.

Best,
Saif.

It seems you may be unfamiliar with how Python variables work, so let me clear up your misconception and explain why the two are the same. If you have any questions, please feel free to reach out as I am a Python teacher and love to help people learn Python better (it’s such a great programming language!).

Let’s look at this code snippet from the original code I posted:
kernel_size = 3 # This declares a variable called “kernel_size” whose value is 3
conv = Conv2D(n_filters, kernel_size, … # This passes the argument 3 via the variable kernel_size
The above code is 100% equivalent to
conv = Conv2D(n_filters, 3, … # this also passes 3 as the 2nd positional param

Both work perfectly and pass all tests when the notebook cells and generate the right pictures for all cells through the first three sections. That’s because the both pass the argument of 3 to the kernel_size parameter of Conv2D.

I’m sorry if this level of Python is confusing; it just takes time to learn the nuances of any programming language - you will get there.

In the meantime, we still need to figure out what is wrong. And more to the point, why my code works perfectly in interactive mode, but the auto-grader doesn’t like it. I have a theory - could it be that the auto-grader uses a different version of tensorflow than the interactive notebook and there are some subtle difference to how those two work? Could you please find out what version of tensorflow is being used by the auto-grader? Then I can use Conda install at home to test the code with that version of Conda and perhaps figure out the issue myself since it looks too tricky for others to debug.

In terms of Python, your code is correct but, I guess, the grader may use different values for kernel_size. That is why it is giving you an error as your defined value of kernel_size may override the value of the grader. Just a thought.

OK - I resorted to looking on the internet to find a working solution so I could diff and figure out what’s going on. It turns out my theory about interactive notebook and auto-grader using different Python configurations was correct. In particular this line of code was the culprit (which works fine in the interactive Python environment, but fails to compile in the auto-grader environment that we don’t have access to):
kernel_initializer=tf.keras.initializers.HeNormal

Instead, the autograder requires it to be specified like so:
kernel_initializer=“he_normal”

As a suggestion; you might consider making the two environments the same since having it work interactively but not auto-grade is really hard to debug (in fact, no obvious way to debug except look up on the internet). Or alternative is just document the version the auto-grader uses.

Anyways, good learning all around. I hope you and Tom also learned a bit about how positional arguments and variables work in Python.