About the gradient penalty

  • Question 1:

    When we calculate the norm of the gradient
    “gradient_norm = gradient.norm(2, dim=1)”
    what does the parameters “2” and “dim=1” means?

  • Question 2:

    why do we do "gradient = gradient.view(len(gradient), -1)“?
    I think tesor.view(len(tensor), -1) yields the same size as the original tensor, please tell me where I understand wrong.

Thank you!

The comments in the template code answer most of your questions. Here’s that section with the comments:

    # Flatten the gradients so that each row captures one image
    gradient = gradient.view(len(gradient), -1)

    # Calculate the magnitude of every row
    gradient_norm = gradient.norm(2, dim=1)

The point is that we are dealing with 4D tensors here. The purpose of the view() there is to “unroll” it into a 2D tensor. For question 1), the 2 there means you want the “2-norm”, which is the Euclidean length if the input were a vector. The dim says you are treating each row of the flattened tensor as a separate input computing the Euclidean length (2-norm) of each row. So the result will be a 1D tensor with the number of entries equal to the number of rows. Here’s a little snippet of code to show the behavior:

foo = torch.zeros(256, 3, 16, 32)
print(foo.shape)
print(f"len(foo) {len(foo)}")
viewFoo = foo.view(len(foo), -1)
print(viewFoo.shape)
normViewFoo = viewFoo.norm(2, dim=1)
print(normViewFoo.shape)

Running that gives this output:

torch.Size([256, 3, 16, 32])
len(foo) 256
torch.Size([256, 1536])
torch.Size([256])