Is there a good way to determine how much information is being used in the lower level convolutions vs the skip connections in a U-Net? I know we could look at validation score for models of varying U depth, but are there other ways to determine this?
It is an interesting idea. But what do you mean “how much information” in this case? How would you define or quantify that?
Here’s one thought that occurs to me: you could introduce a knob to turn for the magnitudes of the data on the skip connections. E.g. by analogy with dividing by keep_prob in dropout regularization but in this case you’d multiply by keep_prob to scale down. The default is keep_prob = 1, of course, and then see what happens if you dial in smaller values of keep_prob. So you are downscaling the 2-norms of the outputs of the skip layers. With the smaller values, do you get a less useful model? Of course that will probably interact with other hyperparameters like learning rate and number of iterations, so maybe it’s not so easy to get a reliable conclusion. Not sure that would work, but just a thought …
Thank you for your response, Paul. I like the idea of regularizing the skip connections to turn down their effect.
I had another thought - what do you think about looking at the diagonal of the gram matrix for the concatenated skip/transpose convolution channels on a few observations in the validation set? This might let us see how active the channels from the skip connection are relative to the channels from the transpose convolution.
That’s also an interesting idea. So you’d get an idea of the magnitude of the learned coefficients as a measure of how “seriously” they take the skip inputs. Since that’s a passive method, it might be easier to interpret the results.
If you try any of these experiments, let us know what you find! Science!
I should mention also that the U-Net material is brand new as of the big April 2021 update to the DLS courses, so I haven’t lived with it for that long and haven’t done any further exploration beyond what Prof Ng says in the lectures. Meaning it’s a disclaimer that I don’t really know what I’m talking about here.