Y_pred is always 0 or 1.
Sometimes it takes on values like [1.0000e+00],
[9.6993e-38],
[0.0000e+00],
[1.0000e+00],
[1.0000e+00],
[0.0000e+00],
[1.0000e+00],
[0.0000e+00],
[1.0000e+00],
[1.0000e+00],
[0.0000e+00],
[1.0000e+00],
But I thought your accuracy results indicated that you are always predicting False in all cases …
I ran it multiple times and each time it changes
import numpy as np
import copy
import matplotlib.pyplot as plt
import h5py
import scipy
from PIL import Image
from scipy import ndimage
from lr_utils import load_dataset
from public_tests import *
import matplotlib.pyplot as plt
import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
num_epochs = 5000
# Loading the data (cat/non-cat)
train_set_x_orig, train_set_y, test_set_x_orig, test_set_y, classes = load_dataset()
# Example of a picture
# index = 25
# plt.imsave("example.png", train_set_x_orig[index])
# print ("y = " + str(train_set_y[:, index]) + ", it's a '" + classes[np.squeeze(train_set_y[:, index])].decode("utf-8") + "' picture.")
# Understand the shape of dataset
# print(f"The dimensions X of the train set {train_set_x_orig.shape}")
# print(f"The dimensions Y of the train set {train_set_y.shape}")
# print(f"The dimensions X of the test set {test_set_x_orig.shape}")
# print(f"The dimensions Y of the test set {test_set_y.shape}")
# Flatten X for train and test
train_set_x_flatten = train_set_x_orig.reshape(train_set_x_orig.shape[0], -1) # pytorch uses (batch_size, dim)
test_set_x_flatten = test_set_x_orig.reshape(test_set_x_orig.shape[0], -1) # pytorch uses (batch_size, dim)
# Reshape from (batch_size, dim) to (dim, batch_size). Note Y is already in this format so we don't convert
# train_set_x_flatten = train_set_x_flatten.T
# test_set_x_flatten = test_set_x_flatten.T
# Understand the shape of dataset
print(f"The dimensions X of the train set {train_set_x_flatten.shape}")
print(f"The dimensions Y of the train set {train_set_y.shape}")
print(f"The dimensions X of the test set {test_set_x_flatten.shape}")
print(f"The dimensions Y of the test set {test_set_y.shape}")
count_zero = 0
count_one = 0
total_count = 0
for val in train_set_y:
if val[0] == 0:
count_zero += 1
else:
count_one += 1
total_count += 1
print(f" % of cats in train : {count_one/total_count:.2f}")
print(f" % of not cats in train : {count_zero/total_count:.2f}")
# Standardize dataset
train_set_x = train_set_x_flatten / 255.
test_set_x = test_set_x_flatten / 255.
class NN(nn.Module):
def __init__(self, input_size):
super(NN, self).__init__()
self.fc1 = nn.Linear(input_size, 1, bias=1)
def forward(self, x):
x = self.fc1(x)
x = F.sigmoid(x)
return x
# set device
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(device)
# Initialize network
model = NN(input_size = train_set_x.shape[1]).to(device)
# Check final shape
test = np.random.randn(train_set_x.shape[0], train_set_x.shape[1]) # (batch_size, dim)
# test = torch.from_numpy(test).to(device)
# test = test.to(torch.float32)
# or
test = torch.tensor(test, dtype=torch.float32).to(device)
assert test.dtype == torch.float32, "Linear layer is float32, convert input to float32 as well"
result = model(test)
np_result = result.detach().cpu().numpy()
print(np_result.shape) # (num_classes, 1) (209, 1)
assert np_result.shape[0] == train_set_y.shape[0] and np_result.shape[1] == train_set_y.shape[1], "Output dimensions are not correct"
# Hyper-parameters
learning_rate = 0.001
# Setting up options
criterion = torch.nn.BCELoss() # Binary Cross Entrophy Loss
optimizer = optim.Adam(model.parameters(), lr=learning_rate)
# Running test
def check_accuracy(X, Y):
with torch.no_grad():
X = torch.tensor(X, dtype=torch.float32).to(device)
Y = torch.tensor(Y, dtype=torch.float32).to(device)
Y_pred = model(X)
print(Y_pred)
Y_pred_bool = Y_pred > 0.5
accuracy = (Y_pred == Y).sum() / X.shape[0]
return accuracy
# Using batch gradient descent, i.e all the data is used per epoch
for epoch in range(num_epochs):
train_set_x = torch.tensor(train_set_x, dtype=torch.float32).to(device)
train_set_y = torch.tensor(train_set_y, dtype=torch.float32).to(device)
prediction = model(train_set_x)
loss = criterion(prediction, train_set_y)
# forward
# optimizer.zero_grad() # we don't need this as we are not using mini-batches
loss.backward()
#gradient descent with adam
optimizer.step()
print(f'Epoch [{epoch+1}/{num_epochs}], Loss: {loss.item():.4f}')
accuracy_train = check_accuracy(train_set_x, train_set_y)
accuracy_test = check_accuracy(test_set_x, test_set_y)
print(f" Final Train Accuracy : {accuracy_train:.2f}, Final Test Accuracy : {accuracy_test:.2f}")
I want to share a log, but I guess I can’t upload anything.
Experimeted with the learning rate. When I set it to 0.001, this gives me all 1. If I set it to 0.009 I get all 0s for Y_pred original
You should be able to “copy/paste” the log the same way you copy/pasted the code. Or you can use the little “Up Arrow” tool to upload a screenshot.
Loss is increasing. That’s not good.
Sorry to interrupt your discussion. I’m curious about this snippet:
# forward
# optimizer.zero_grad() # we don't need this as we are not using mini-batches
Do ensure that optimizer.zero_grad() is called before triggering the backward pass of the loss.
See this:
It is beneficial to zero out gradients when building a neural network. This is because by default, gradients are accumulated in buffers (i.e, not overwritten) whenever .backward() is called.
and the notebook:
train_nn.ipynb (215.8 KB)
Thanks Paul, TMosh and Balaji for looking into this.
I removed the comment for this part optimizer.zero_grad()
in my code and it worked.
So from what I gather, for each epoch, since we need to update the gradients, we need to clear it first (if using pytorch) ?
Invoke optimizer.zero_grad()
before updating model weights.
Hi,
I randomly stumbled into this discussion. But the last post by Balaji might be misleading to many learners:
I’m sure he meant something different than this quote. In particular, this would never work:
# Backward pass - calculate gradients
loss.backward()
# Zero gradients
optimizer.zero_grad()
# Optimization step - update model weights
optimizer.step()
Here, we would invoke the optimizer.zero_grad()
before updating model weights and this would make gradients zero, so optimizer would not update any model weights.
You can call optimizer.zero_grad()
everywhere in the loop but not between the loss.backward()
and optimizer.step()
operation. In particular, my preferred choice:
# Zero all previous gradients before loss calculation
optimizer.zero_grad()
# Backward pass - calculate gradients
loss.backward()
# Optimization step - update model weights
optimizer.step()
or this is also valid:
# Backward pass - calculate gradients
loss.backward()
# Optimization step - update model weights
optimizer.step()
# Zero gradients after we updated the model weights for next iterations
optimizer.zero_grad()
In simple words, optimizer accumulates gradients and we need to set them to zero before we calculate loss repeatedly (for example, with every mini-batch).
Anyways, just an observation on my side that some learners might misunderstand what Balaji meant.
Cheers
Yes, that’s the way it works in pytorch. In addition to the discussion from Balaji and Arvydas, here’s a StackExchange article about it.