YoloV5 Output/ Model Not Learning with Custom Data

I am training yolov5 on a custom dataset. I beleive the output of the model is a number of bounding boxes, I am then calculating the loss based on the class_label,x,y,w,h of each box_cordinate and each expected label. In essence, looping through each bounding_box and comparing these values of the bounding box with the each of the expected output labels. This is the code
`
epochs = 10
learning_rate = 0.001
optimizer = optim.Adam(model.parameters(), lr=learning_rate)
loss_function = nn.MSELoss()
#Training loop

for epoch in range(epochs):

train_loss = 0.0

# Training

model.train()
for image_tensor, label_tensor in train_image_label_pairs.items():
    optimizer.zero_grad()

    # Forward pass
    output = model(image_tensor)

    # Extract bounding box coordinates (x, y, w, h)
    bounding_boxes = output[0][..., :5]  # Shape: (1, 3, 52, 52, 4)
    bbox_tensors = []
    for box in bounding_boxes:
        class_label = box[0][0][0][0].item()
        x = box[0][0][0][1].item()
        y = box[0][0][0][2].item()
        w = box[0][0][0][3].item()
        h = box[0][0][0][4].item()
        box_tensor = torch.tensor([class_label, x, y, w, h], requires_grad=True)
        bbox_tensors.append(box_tensor)
    # Convert the list of tensors to a single tensor
    bbox_tensors = torch.stack(bbox_tensors)            

    # Calculate the loss
    for label in label_tensor:
        loss = loss_function(bbox_tensors[0], label)
        
        # Backward pass and optimization
        loss.backward()
        optimizer.step()
        # Accumulate the training loss
    train_loss += loss.item()
# Compute the average training loss for the epoch
avg_train_loss = train_loss / len(train_image_label_pairs)
# Print the training loss and validation loss (if applicable)
print(f"Epoch [{epoch+1}/{epochs}], Train Loss: {avg_train_loss}")`                              

The output here is consistently this: Epoch [1/10], Train Loss: 23.490465210034298 for all the epochs. Does this mean that the model is not learning also if anyone knows what each of the values of the yolov5 bounding boxes output is kindly let me know this could really help, this is an example of one: [ 0.19493, -0.01519, -0.04640, …, -6.55874, -6.98538, -5.91408]Also this is not related to the asssignment.

Because of the disjointed way you “copy/pasted” your code, it’s a little hard to know what to say. Of course the point is that indentation is key to python syntax, particularly as it pertains to how loops are structured. Just to point out one instance, notice that the increment to train_loss does not happen in the loop over the label values in label_tensor, right? So it will only get incremented with the value from the last iteration of the loop. There is no accumulation happening in the loop. I’m hoping this is just a “cut-n-paste” artifact.

Maybe there are similar errors w.r.t. the loop over the training epochs.

Also it seems kind of backwards for you to be asking us to explain your data to you. I don’t know anything about YOLOv5. If it is available for you to import and use, there must be some supporting documentation. Or at least a GitHub repo with the source code and examples of the data. Seems like that is the place to go for more information. Maybe you’ll get lucky and there is someone else listening here who knows more about YOLOv5 than I do, but I think expecting someone else to read the documentation and then explain it to you is not the right model. The point of doing projects on your own is that it’s a learning experience and in addition to the specific topical information that you learn, you are also developing your skills for how to approach this kind of problem solving in general.

The famous proverb most commonly attributed to the philosopher Lao Tzu is a useful way to think about this point:

“If I give a man a fish, he will not be hungry today. If I teach a man to fish, then he will never be hungry again.”

So please keep in mind that what you are trying to do here is learn to fish. :nerd_face:

2 Likes

Hi,
In addition to what Paul said, I want to add few points to your approach.
The first inference that you can take here (Loss doesn’t change) is that there is some error/mistake in the loss function (you have to be mindful of all the losses that are part of the main loss function – in the case of YOLOV5 these are bounding box loss, objectness loss and classification loss) , if this is a custom code, then check the loss function, so it goes wrong in reshaping or something like that, it might always turn out to be a vector of zeroes or some constant for whatever value you plug in. So be mindful of that and also as said, it is very difficult to find out your problem, with just a snippet of your code, and make sure that atleast you are clear about how you are implementing it.

YOLOV5 is an open-source model, ULTRALYTICS team who developed it, has a GitHub repo which you can use to refer to and learn how they are implementing.

Also, make sure you know the basics very clearly and don’t just dive directly into using these complex models without knowing how it works. I as a learner too sometimes get attracted to doing these things but it doesn’t give a feel unless we understand it clearly. And always as the great Paul says - “The point of doing projects on our own is that it’s a learning experience and in addition to the specific topical information that you learn, you are also developing your skills for how to approach this kind of problem-solving in general.” So try to figure out what goes wrong by exploring more on your own, try to learn instead of trying to find an immediate solution when it comes to doing projects on your own. It is time-consuming but it seriously improves our thought process - (sharing my own experience).
Cheers,
Nithin

1 Like