Hi community,
I am following a computer vision class and I am trying to implement the naive SVM. The aim is to compute the gradient of the SVM term of the loss function: compute the derivative at the same time that the loss is being computed.
Here is the function code:
def svm_loss_naive(
W: torch.Tensor, X: torch.Tensor, y: torch.Tensor, reg: float
):
"""
Structured SVM loss function, naive implementation (with loops).
Inputs have dimension D, there are C classes, and we operate on minibatches
of N examples. When you implment the regularization over W, please DO NOT
multiply the regularization term by 1/2 (no coefficient).
Inputs:
- W: A PyTorch tensor of shape (D, C) containing weights.
- X: A PyTorch tensor of shape (N, D) containing a minibatch of data.
- y: A PyTorch tensor of shape (N,) containing training labels; y[i] = c means
that X[i] has label c, where 0 <= c < C.
- reg: (float) regularization strength
Returns a tuple of:
- loss as torch scalar
- gradient of loss with respect to weights W; a tensor of same shape as W
"""
dW = torch.zeros_like(W) # initialize the gradient as zero
# compute the loss and the gradient
num_classes = W.shape[1]
num_train = X.shape[0]
loss = 0.0
for i in range(num_train):
scores = W.t().mv(X[i])
correct_class_score = scores[y[i]]
for j in range(num_classes):
if j == y[i]:
continue
margin = scores[j] - correct_class_score + 1 # note delta = 1
if margin > 0:
loss += margin
#######################################################################
# TODO: #
# Compute the gradient of the SVM term of the loss function and store #
# it on dW. (part 1) Rather than first computing the loss and then #
# computing the derivative, it is simple to compute the derivative #
# at the same time that the loss is being computed. #
#######################################################################
# Replace "pass" statement with your code
pass
#######################################################################
# END OF YOUR CODE #
#######################################################################
# Right now the loss is a sum over all training examples, but we want it
# to be an average instead so we divide by num_train.
loss /= num_train
# Add regularization to the loss.
loss += reg * torch.sum(W * W)
#############################################################################
# TODO: #
# Compute the gradient of the loss function w.r.t. the regularization term #
# and add it to dW. (part 2) #
#############################################################################
# Replace "pass" statement with your code
pass
#############################################################################
# END OF YOUR CODE #
#############################################################################
return loss, dW
In the class here is how the loss and the gradient with respect to the loss are defined:
I wrote this for the first placeholder:
perturbation = 0.0001
perturbated_loss = 0.0
perturbated_W = W[y[j]] + perturbation
perturbated_margin = perturbated_W - correct_class_score + 1
perturbated_loss += perturbated_margin
dW = (perturbated_loss - loss) / perturbation # Shape: 10
but I don’t really understand how to get the ith element in this case to compute the loss.
Thank you for your assistance