ResNet50 comparator

Hey all,

While checking the ResNet50 by using

comparator(summary(model), ResNet50_summary)

I got the following error:

Test failed
Expected value

[‘Conv2D’, (None, 8, 8, 512), 66048, ‘valid’, ‘linear’, ‘GlorotUniform’]

does not match the input value:

[‘Conv2D’, (None, 8, 8, 512), 131584, ‘valid’, ‘linear’, ‘GlorotUniform’]

I have checked the variable dimensions in my codes several times based on the descriptions. Any ideas?

1 Like

The number that doesn’t match there is the number of trainable parameters in the layer. Note that your value is 512 less than twice the expected value. That suggests that what is wrong has to do with either the filter size or number of channels. But in either case, you’d expect more problems in the next layer as well.

Actually, I think this is just a problem in the order in which the rows are added to the model description. Take a look at the “expected” value in the outputs.py file. Note that there is one case in which there are two adjacent layers that look just like your layer and the one it expects. I’ll bet they ended up in the opposite order in your code. One theory would be that it builds the graph in the order that things are referenced. I’ll bet you did the following “Add” layer with the arguments in the opposite order. Addition is commutative, of course, so you’d expect it not to matter, but maybe in this case it does. :scream_cat:

Update: Yes! I can produce exactly that error by reversing the order of the operands on the Add for the shortcut layer in the convolutional_block. Try reversing those operands and I’ll bet the problem goes away!

1 Like

Hi Paul,
I noticed that as well. This is a bug. Add(X,Y) should be commutative.

Addition is commutative, but the point is that the order of the operands affects the order in which the graph of the “skip” layers gets enumerated. I’ve reported this to the course staff, but it’s not clear there is any easy solution. They’d have to write more complicated checking logic to accept either order. For the moment, you just have to “deal with it”.

Also please note that the addition operation there is part of the template code. They gave you the correct code that passes the check. You had no reason to alter it, so that’s on you.

Perhaps the unit test is being too specific in the parameters it is checking. If two solutions are functionally equivalent, the unit test should pass either one. Maybe the “number of trainable parameters” isn’t a good parameter to test.

That’s a good point. If they changed it to only check the operators and shapes at every level, that should still guarantee correctness. But is that simpler than changing the test to handle two different correct answers? Not sure. I guess it would depend on how dextrous they are with regular expressions.

But the “bandaid” solution they picked was to move the “Add” logic from the user completed area to the template, so it just papers over the problem. The only way the user can hit this issue is if they modify the given template code. Sigh.

I had this same error - and with your explanation, i was able to correct it. But how on earth would i have been able to find this error by myself? What is there to learn here?

You’re right that this does seem a trifle over the top in that it requires understanding at least the basics of how compute graphs work together with the implications of the fact that their test does a textual comparison of the model “summary()” outputs. But to be fair, because of exactly this point, they changed the notebook quite a while ago (forget exactly when, but it was > 1 year ago) so that the code that does the addition of the shortcut layer in the convolutional_block function is outside of the code you need to write. Meaning that they wrote it correctly for you. So if you got it wrong, then you must have gone out of your way to change something that did not require changing. Well, that or you copied an old solution off the internet, which is a) against the rules and b) also generally risky because of the point that the assignments do evolve over time and there is no guarantee that “solutions” you find on github are necessarily completely correct.