The comments in the template code answer most of your questions. Here’s that section with the comments:
# Flatten the gradients so that each row captures one image
gradient = gradient.view(len(gradient), -1)
# Calculate the magnitude of every row
gradient_norm = gradient.norm(2, dim=1)
The point is that we are dealing with 4D tensors here. The purpose of the view()
there is to “unroll” it into a 2D tensor. For question 1), the 2 there means you want the “2-norm”, which is the Euclidean length if the input were a vector. The dim
says you are treating each row of the flattened tensor as a separate input computing the Euclidean length (2-norm) of each row. So the result will be a 1D tensor with the number of entries equal to the number of rows. Here’s a little snippet of code to show the behavior:
foo = torch.zeros(256, 3, 16, 32)
print(foo.shape)
print(f"len(foo) {len(foo)}")
viewFoo = foo.view(len(foo), -1)
print(viewFoo.shape)
normViewFoo = viewFoo.norm(2, dim=1)
print(normViewFoo.shape)
Running that gives this output:
torch.Size([256, 3, 16, 32])
len(foo) 256
torch.Size([256, 1536])
torch.Size([256])