U-Net Architecture "up-conv" operation

It’s also worth noting that it is completely common and “flavor vanilla” that algorithms get improved over time from what was in the original paper. As things get deployed and used at scale, people come up with improvements. E.g. how many versions of YOLO have there been now, since the original 2015 paper?

Another interesting and nice clear example of such an improvement is “inverted” drop out. Go back and read the original Srivastava, Hinton and Sutskever paper that introduced drop out and notice that they hadn’t thought of the “inverted” idea yet, so they had to “downscale” the weights at “inference” time. The way it is done now where we multiply by 1/keep_prob at training time to rescale the expected values is just so much cleaner and simpler. I’ll bet Hinton does a big Homer Simpson “D’oh!” every time he remembers that oversight. :laughing: Here’s a thread that discusses this point in more detail.

1 Like