Implementing using pytorch does not produce same results

krithika_govindaraj · April 6, 2024, 12:55am

Y_pred is always 0 or 1.
Sometimes it takes on values like [1.0000e+00],
[9.6993e-38],
[0.0000e+00],
[1.0000e+00],
[1.0000e+00],
[0.0000e+00],
[1.0000e+00],
[0.0000e+00],
[1.0000e+00],
[1.0000e+00],
[0.0000e+00],
[1.0000e+00],

paulinpaloalto · April 6, 2024, 12:57am

But I thought your accuracy results indicated that you are always predicting False in all cases …

krithika_govindaraj · April 6, 2024, 12:58am

I ran it multiple times and each time it changes

krithika_govindaraj · April 6, 2024, 12:59am

import numpy as np
import copy
import matplotlib.pyplot as plt
import h5py
import scipy
from PIL import Image
from scipy import ndimage
from lr_utils import load_dataset
from public_tests import *
import matplotlib.pyplot as plt

import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F

num_epochs = 5000

# Loading the data (cat/non-cat)
train_set_x_orig, train_set_y, test_set_x_orig, test_set_y, classes = load_dataset()

# Example of a picture
# index = 25
# plt.imsave("example.png", train_set_x_orig[index])
# print ("y = " + str(train_set_y[:, index]) + ", it's a '" + classes[np.squeeze(train_set_y[:, index])].decode("utf-8") +  "' picture.")

# Understand the shape of dataset
# print(f"The dimensions X of the train set {train_set_x_orig.shape}")
# print(f"The dimensions Y of the train set {train_set_y.shape}")
# print(f"The dimensions X of the test set {test_set_x_orig.shape}")
# print(f"The dimensions Y of the test set {test_set_y.shape}")

# Flatten X for train and test
train_set_x_flatten = train_set_x_orig.reshape(train_set_x_orig.shape[0], -1)  # pytorch uses (batch_size, dim) 
test_set_x_flatten = test_set_x_orig.reshape(test_set_x_orig.shape[0], -1) #  pytorch uses (batch_size, dim)

# Reshape from (batch_size, dim) to (dim, batch_size). Note Y is already in this format so we don't convert
# train_set_x_flatten = train_set_x_flatten.T
# test_set_x_flatten = test_set_x_flatten.T


# Understand the shape of dataset
print(f"The dimensions X of the train set {train_set_x_flatten.shape}")
print(f"The dimensions Y of the train set {train_set_y.shape}")
print(f"The dimensions X of the test set {test_set_x_flatten.shape}")
print(f"The dimensions Y of the test set {test_set_y.shape}")

count_zero = 0
count_one = 0
total_count = 0
for val in train_set_y:
    if val[0] == 0:
        count_zero += 1
    else:
        count_one += 1
    total_count += 1

print(f" % of cats in train : {count_one/total_count:.2f}")
print(f" % of not cats in train : {count_zero/total_count:.2f}")


# Standardize dataset
train_set_x = train_set_x_flatten / 255.
test_set_x = test_set_x_flatten / 255.

class NN(nn.Module):
    def __init__(self, input_size):
        super(NN, self).__init__()
        self.fc1 = nn.Linear(input_size, 1, bias=1)
    
    def forward(self, x):
        x = self.fc1(x)  
        x = F.sigmoid(x)
    
        return x


# set device
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(device)

# Initialize network
model = NN(input_size = train_set_x.shape[1]).to(device)

# Check final shape
test = np.random.randn(train_set_x.shape[0], train_set_x.shape[1]) # (batch_size, dim)

# test = torch.from_numpy(test).to(device)
# test = test.to(torch.float32)
# or 

test = torch.tensor(test, dtype=torch.float32).to(device)

assert test.dtype == torch.float32, "Linear layer is float32, convert input to float32 as well"

result = model(test)
np_result = result.detach().cpu().numpy()

print(np_result.shape) # (num_classes, 1) (209, 1)

assert np_result.shape[0] == train_set_y.shape[0] and np_result.shape[1] == train_set_y.shape[1], "Output dimensions are not correct"

# Hyper-parameters
learning_rate = 0.001

# Setting up options
criterion = torch.nn.BCELoss() # Binary Cross Entrophy Loss
optimizer = optim.Adam(model.parameters(), lr=learning_rate)


# Running test
def check_accuracy(X, Y):
    with torch.no_grad():
        X = torch.tensor(X, dtype=torch.float32).to(device)
        Y = torch.tensor(Y, dtype=torch.float32).to(device)
    
        Y_pred = model(X)
        print(Y_pred)
        Y_pred_bool = Y_pred > 0.5
        accuracy = (Y_pred == Y).sum() / X.shape[0]
    return accuracy


# Using batch gradient descent, i.e all the data is used per epoch 
for epoch in range(num_epochs):
    
    train_set_x = torch.tensor(train_set_x, dtype=torch.float32).to(device)
    train_set_y = torch.tensor(train_set_y, dtype=torch.float32).to(device)

    prediction = model(train_set_x)
    loss = criterion(prediction, train_set_y)

    # forward
    # optimizer.zero_grad() # we don't need this as we are not using mini-batches
    loss.backward() 

    #gradient descent with adam
    optimizer.step()

    
    print(f'Epoch [{epoch+1}/{num_epochs}], Loss: {loss.item():.4f}')




accuracy_train = check_accuracy(train_set_x, train_set_y)
accuracy_test = check_accuracy(test_set_x, test_set_y)

print(f" Final Train Accuracy : {accuracy_train:.2f}, Final Test Accuracy : {accuracy_test:.2f}")

krithika_govindaraj · April 6, 2024, 1:00am

I want to share a log, but I guess I can’t upload anything.

krithika_govindaraj · April 6, 2024, 1:23am

Experimeted with the learning rate. When I set it to 0.001, this gives me all 1. If I set it to 0.009 I get all 0s for Y_pred original

paulinpaloalto · April 6, 2024, 1:57am

You should be able to “copy/paste” the log the same way you copy/pasted the code. Or you can use the little “Up Arrow” tool to upload a screenshot.

krithika_govindaraj · April 8, 2024, 4:52am

Run where the Y_pred is zero:

krithika_govindaraj · April 8, 2024, 4:53am

Run where Y_pred is one:

TMosh · April 8, 2024, 6:16am

Loss is increasing. That’s not good.

balaji.ambresh · April 8, 2024, 6:47am

Sorry to interrupt your discussion. I’m curious about this snippet:

# forward
# optimizer.zero_grad() # we don't need this as we are not using mini-batches

Do ensure that optimizer.zero_grad() is called before triggering the backward pass of the loss.
See this:

It is beneficial to zero out gradients when building a neural network. This is because by default, gradients are accumulated in buffers (i.e, not overwritten) whenever .backward() is called.

and the notebook:
train_nn.ipynb (215.8 KB)

krithika_govindaraj · April 10, 2024, 3:52am

Thanks Paul, TMosh and Balaji for looking into this.

I removed the comment for this part optimizer.zero_grad() in my code and it worked.
So from what I gather, for each epoch, since we need to update the gradients, we need to clear it first (if using pytorch) ?

balaji.ambresh · April 10, 2024, 4:44am

Invoke optimizer.zero_grad() before updating model weights.

arvyzukai · April 10, 2024, 2:05pm

Hi,

I randomly stumbled into this discussion. But the last post by Balaji might be misleading to many learners:

I’m sure he meant something different than this quote. In particular, this would never work:

# Backward pass - calculate gradients
loss.backward()

# Zero gradients
optimizer.zero_grad()

# Optimization step - update model weights
optimizer.step()

Here, we would invoke the optimizer.zero_grad() before updating model weights and this would make gradients zero, so optimizer would not update any model weights.

You can call optimizer.zero_grad() everywhere in the loop but not between the loss.backward() and optimizer.step() operation. In particular, my preferred choice:

# Zero all previous gradients before loss calculation
optimizer.zero_grad()

# Backward pass - calculate gradients
loss.backward()

# Optimization step - update model weights
optimizer.step()

or this is also valid:

# Backward pass - calculate gradients
loss.backward()

# Optimization step - update model weights
optimizer.step()

# Zero gradients after we updated the model weights for next iterations
optimizer.zero_grad()

In simple words, optimizer accumulates gradients and we need to set them to zero before we calculate loss repeatedly (for example, with every mini-batch).

Anyways, just an observation on my side that some learners might misunderstand what Balaji meant.

Cheers

paulinpaloalto · April 10, 2024, 3:23pm

Yes, that’s the way it works in pytorch. In addition to the discussion from Balaji and Arvydas, here’s a StackExchange article about it.

Topic		Replies	Views
Train/test accuracy mismatch Neural Networks and Deep Learning	4	721	April 29, 2021
Logistic Regression Exercise 2 Neural Networks and Deep Learning	1	630	June 23, 2021
Difficulty in Achieving Required Implementation about flatten Neural Networks and Deep Learning	2	578	April 26, 2023
Difference between the code and why is my code wrong Neural Networks and Deep Learning week-2	2	13	December 12, 2024
Week 2 Lab - Logistic Regression with a Neural Network Mindset Neural Networks and Deep Learning week-2	1	12	April 29, 2025

Implementing using pytorch does not produce same results

Related topics