Question about Loss Derivative in Gaty's Style Tfer Paper

In Gaty’s paper “A Neural Algorithm of Artistic Style”, he gives the following content loss function and a derivative for that loss function, with respect to the generated image’s feature level activation values:

Why does he set the derivative to zero when the feature-level activations become negative?

Hmm its an interesting question and needs some research. What comes into my mind is the relu activation where negative values of weights are clipped.

And I am thinking if F is negative the difference between -F-P squared = F+P squared will be increasing, we need F positive so F-P squared keeps decreasing over time. So I think he clips those derivatives at 0. This negative value of F means we have moved into a completely opposite direction from our target.

I also started think ReLU-ish thoughts, but ReLUs are used to preserve non-linearity with a region of linear responsiveness and to avoid the saturation region of sigmoid-ish functions.

I don’t entirely follow your second thought, but am not certain we need F to be positive. I think both F and P could be either positive or negative, unless they’ve both been ReLU clipped, in which case it’s a moot point.

But, the thing we’re trying to minimize is 0.5(F-P)^2. By forcing dL/dF to be zero when F is negative, we’re saying that once F is no longer positive, it’s not suitable to use for minimization.

I suppose the real answer is just try both options in code and see what happens.

1 Like