# Concept explanation

Hi community,
I am following a computer vision class and I am trying to implement the naive SVM. The aim is to compute the gradient of the SVM term of the loss function: compute the derivative at the same time that the loss is being computed.
Here is the function code:


def svm_loss_naive(
W: torch.Tensor, X: torch.Tensor, y: torch.Tensor, reg: float
):
"""
Structured SVM loss function, naive implementation (with loops).

Inputs have dimension D, there are C classes, and we operate on minibatches
of N examples. When you implment the regularization over W, please DO NOT
multiply the regularization term by 1/2 (no coefficient).

Inputs:
- W: A PyTorch tensor of shape (D, C) containing weights.
- X: A PyTorch tensor of shape (N, D) containing a minibatch of data.
- y: A PyTorch tensor of shape (N,) containing training labels; y[i] = c means
that X[i] has label c, where 0 <= c < C.
- reg: (float) regularization strength

Returns a tuple of:
- loss as torch scalar
- gradient of loss with respect to weights W; a tensor of same shape as W
"""
dW = torch.zeros_like(W)  # initialize the gradient as zero

# compute the loss and the gradient
num_classes = W.shape[1]
num_train = X.shape[0]
loss = 0.0
for i in range(num_train):
scores = W.t().mv(X[i])
correct_class_score = scores[y[i]]
for j in range(num_classes):
if j == y[i]:
continue
margin = scores[j] - correct_class_score + 1  # note delta = 1
if margin > 0:
loss += margin
#######################################################################
# TODO:                                                               #
# Compute the gradient of the SVM term of the loss function and store #
# it on dW. (part 1) Rather than first computing the loss and then    #
# computing the derivative, it is simple to compute the derivative    #
# at the same time that the loss is being computed.                   #
#######################################################################
# Replace "pass" statement with your code
pass
#######################################################################
#                       END OF YOUR CODE                              #
#######################################################################

# Right now the loss is a sum over all training examples, but we want it
# to be an average instead so we divide by num_train.
loss /= num_train

# Add regularization to the loss.
loss += reg * torch.sum(W * W)

#############################################################################
# TODO:                                                                     #
# Compute the gradient of the loss function w.r.t. the regularization term  #
# and add it to dW. (part 2)                                                #
#############################################################################
# Replace "pass" statement with your code
pass
#############################################################################
#                             END OF YOUR CODE                              #
#############################################################################

return loss, dW



In the class here is how the loss and the gradient with respect to the loss are defined:

I wrote this for the first placeholder:

perturbation = 0.0001
perturbated_loss = 0.0
perturbated_W = W[y[j]] + perturbation
perturbated_margin = perturbated_W - correct_class_score + 1
perturbated_loss += perturbated_margin
dW = (perturbated_loss - loss) / perturbation # Shape: 10



but I don’t really understand how to get the ith element in this case to compute the loss.

Difficult to help with this, as SVM is not used very much now.

Thank you @TMosh for your reply. I’m new im computer vision so I want to learn and understand.

Computer vision often uses a Convolutional Neural Network. Covered in the Deep Learning Specialization.

To compute the gradient of the SVM term of the loss function during the loss computation, we need to carefully calculate the gradients with respect to the weights ( W ) while iterating through the training examples. Your approach of using finite differences for the gradient calculation (i.e., perturbing the weights and measuring the change in loss) is not necessary here because the gradient can be directly computed from the loss function’s formula. Here is the complete implementation with detailed explanations:

def svm_loss_naive(
W: torch.Tensor, X: torch.Tensor, y: torch.Tensor, reg: float
):
"""
Structured SVM loss function, naive implementation (with loops).

Inputs have dimension D, there are C classes, and we operate on minibatches
of N examples. When you implment the regularization over W, please DO NOT
multiply the regularization term by 1/2 (no coefficient).

Inputs:
- W: A PyTorch tensor of shape (D, C) containing weights.
- X: A PyTorch tensor of shape (N, D) containing a minibatch of data.
- y: A PyTorch tensor of shape (N,) containing training labels; y[i] = c means
that X[i] has label c, where 0 <= c < C.
- reg: (float) regularization strength

Returns a tuple of:
- loss as torch scalar
- gradient of loss with respect to weights W; a tensor of same shape as W
"""
dW = torch.zeros_like(W)  # initialize the gradient as zero

# compute the loss and the gradient
num_classes = W.shape[1]
num_train = X.shape[0]
loss = 0.0
for i in range(num_train):
scores = W.t().mv(X[i])
correct_class_score = scores[y[i]]
for j in range(num_classes):
if j == y[i]:
continue
margin = scores[j] - correct_class_score + 1  # note delta = 1
if margin > 0:
loss += margin
# Compute the gradient of the SVM term of the loss function
dW[:, j] += X[i]  # Increase the weight for incorrect class
dW[:, y[i]] -= X[i]  # Decrease the weight for correct class

# Right now the loss is a sum over all training examples, but we want it
# to be an average instead so we divide by num_train.
loss /= num_train
dW /= num_train  # Average out the gradient as well

# Add regularization to the loss.
loss += reg * torch.sum(W * W)

# Compute the gradient of the loss function w.r.t. the regularization term
dW += 2 * reg * W  # Gradient of regularization term

return loss, dW


### Explanation:

1. Gradient Calculation During Loss Computation:

• When the margin is greater than 0, indicating a violation of the SVM constraint, the loss is incremented by the margin.
• At the same time, the gradient is updated:
• For the incorrect class ( j ), we increase the gradient by ( X[i] ) because increasing ( W[:, j] ) increases the score for the incorrect class.
• For the correct class ( y[i] ), we decrease the gradient by ( X[i] ) because increasing ( W[:, y[i]] ) decreases the score for the correct class.
2. Averaging the Loss and Gradient:

• The computed loss is divided by the number of training examples to get the average loss.
• Similarly, the gradient ( dW ) is divided by the number of training examples.
3. Regularization:

• The regularization term ( reg * \sum(W * W) ) is added to the loss.
• The gradient of the regularization term with respect to ( W ) is ( 2 * reg * W ), which is added to ( dW ).

This method ensures that the gradient is correctly computed simultaneously with the loss, making the implementation more efficient and straightforward.
public class AClass{}