Although I understood the -1 is related to batch size creation for fully connected layers, still a bit confused

hi everyone,

i came across upgraded lab of pytorch specialisation course 2 week 1, 1st upgraded lab Hyperparameter tuning where I came across the flattening of features maps for fully connected layers.

this is the screenshot of the code (posting as it is from upgraded lab) :backhand_index_pointing_down:t2:

in the def forward statement for the flatten layer i noticed -1 being mentioned with input, which I understood was related to dimension, or automatic batch size conversion for fully connected layers.

so my doubt is

is this -1 for 32x8x8=2048 which will be divided into batches ? after passing the flatten and then go to the next fc layer? or if anyone can show me in calcuation from the first cnn till the last final output layer and especially at the flatten layer with the significance of -1. I would be highly grateful.

@balaji.ambresh @rmwkwok

Regards

DP

-1 refers to batch size. When it comes to view, -1 will multiply remaining dimensions of the input to generate the output. Here’s an example:

a = torch.randn(2, 3, 4)
print(a.view(-1, 4).shape) # torch.Size([6, 4])

Use torchinfo to view model summary.

import torch
import torch.nn as nn
import torch.nn.functional as F

class SimpleCNN(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(3, 16, kernel_size=3, padding=1)
        self.conv2 = nn.Conv2d(16, 32, kernel_size=3, padding=1)
        self.pool = nn.MaxPool2d(2, 2)
        self.fc1 = nn.Linear(32 * 8 * 8, 64)
        self.fc2 = nn.Linear(64, 10)
        self.dropout = nn.Dropout(p=0.4)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = x.view(-1, 32 * 8 * 8)
        x = F.relu(self.fc1(x))
        x = self.dropout(x)
        x = self.fc2(x)
        return x

import torchinfo # pip install torchinfo
model = SimpleCNN()
batch_size = 8
torchinfo.summary(model, input_size=(batch_size, 3, 32, 32), verbose=2);

Output:

==========================================================================================
Layer (type:depth-idx)                   Output Shape              Param #
==========================================================================================
SimpleCNN                                [8, 10]                   --
├─Conv2d: 1-1                            [8, 16, 32, 32]           448
│    └─weight                                                      ├─432
│    └─bias                                                        └─16
├─MaxPool2d: 1-2                         [8, 16, 16, 16]           --
├─Conv2d: 1-3                            [8, 32, 16, 16]           4,640
│    └─weight                                                      ├─4,608
│    └─bias                                                        └─32
├─MaxPool2d: 1-4                         [8, 32, 8, 8]             --
├─Linear: 1-5                            [8, 64]                   131,136
│    └─weight                                                      ├─131,072
│    └─bias                                                        └─64
├─Dropout: 1-6                           [8, 64]                   --
├─Linear: 1-7                            [8, 10]                   650
│    └─weight                                                      ├─640
│    └─bias                                                        └─10
==========================================================================================
Total params: 136,874
Trainable params: 136,874
Non-trainable params: 0
Total mult-adds (M): 14.23
==========================================================================================
Input size (MB): 0.10
Forward/backward pass size (MB): 1.58
Params size (MB): 0.55
Estimated Total Size (MB): 2.22
==========================================================================================

Consider using flatten as well: x = torch.flatten(x, start_dim=1)

1 Like

thank you for the detail response @balaji.ambresh

so basically -1 is only representing here batch size and not dimensions right ?? (I was confused about this)

Also in the output you shared, there is no flattening layer output? because that’s where my confusion begins with how that -1 further get calculated

We are assuming that the first dimension is the “samples” dimension, right? So aren’t they the same thing? Meaning that the size of the first dimension (index 0 in python) is the number of samples. So it’s the size of a dimension. And the -1 just means “use whatever the size is”, meaning that the code works for any batch size.

You don’t need to specify the output size, since it’s determined by the sizes of all the dimensions other than the first dimension (samples dimension). So it’s a very nice and general way to write the code.

We can construct some experiments using torch.Tensor.view and torch.flatten, similar to this post about numpy np.reshape.

I will play around with this, but it may take me a few hours (have some real life to take care of :grinning_face:). Stay tuned!

Here’s my function to create a “telltale” 4D tensor:

# routine to generate a telltale 4D tensor to play with
def testarray(shape):
    (d1,d2,d3,d4) = shape
    A = torch.zeros(*shape, dtype = torch.int32)

    for ii1 in range(d1):
        for ii2 in range(d2):
            for ii3 in range(d3):
                for ii4 in range(d4):
                    A[ii1,ii2,ii3,ii4] = ii1 * 1000 + ii2 * 100 + ii3 * 10 + ii4 

    return A

So the value in each position of the tensor shows the index values of its position in the tensor with each dimension in order. That is to say A[1,2,3,4] = 1234. Of course this is only going to be understandable if all the dimensions are single digit size.

Now let’s see what happens when we use the two “flatten” methods that we see above.

In the first case, let’s create a sample of shape (3, 2, 2, 3). So there are 3 samples, each of which is a 2 x 2 x 3 tensor. You can think of it as an image of shape 2 x 2 with 3 RGB pixel values, but the pixels are not normal 0 - 255 values.

sample3 = testarray([3, 2, 2, 3])
print(f"sample3 =\n{sample3}")
sample3 =
tensor([[[[   0,    1,    2],
          [  10,   11,   12]],

         [[ 100,  101,  102],
          [ 110,  111,  112]]],


        [[[1000, 1001, 1002],
          [1010, 1011, 1012]],

         [[1100, 1101, 1102],
          [1110, 1111, 1112]]],


        [[[2000, 2001, 2002],
          [2010, 2011, 2012]],

         [[2100, 2101, 2102],
          [2110, 2111, 2112]]]], dtype=torch.int32)

Now apply the first method of flattening using the view() method on the tensor:

sampleView = sample3.view(-1, 2*2*3)
print(f"sampleView =\n{sampleView}")
print(f"sampleView.shape =\n{sampleView.shape}")
sampleView =
tensor([[   0,    1,    2,   10,   11,   12,  100,  101,  102,  110,  111,  112],
        [1000, 1001, 1002, 1010, 1011, 1012, 1100, 1101, 1102, 1110, 1111, 1112],
        [2000, 2001, 2002, 2010, 2011, 2012, 2100, 2101, 2102, 2110, 2111, 2112]],
       dtype=torch.int32)
sampleView.shape =
torch.Size([3, 12])

So the output is a 3 x 12 2D tensor. There are 3 rows and you can see that the first dimension of each entry in the row is the index of that row: 0 in the first row, 1 in the second row and 2 in the third row. Within each row, you can see that the flattening happens in reverse order by dimensions.

Now let’s try the other method using the flatten() function. We recreate the input with 4 samples this time:

sample4 = testarray([4, 2, 2, 3])
print(f"sample4 =\n{sample4}")
sample4 =
tensor([[[[   0,    1,    2],
          [  10,   11,   12]],

         [[ 100,  101,  102],
          [ 110,  111,  112]]],


        [[[1000, 1001, 1002],
          [1010, 1011, 1012]],

         [[1100, 1101, 1102],
          [1110, 1111, 1112]]],


        [[[2000, 2001, 2002],
          [2010, 2011, 2012]],

         [[2100, 2101, 2102],
          [2110, 2111, 2112]]],


        [[[3000, 3001, 3002],
          [3010, 3011, 3012]],

         [[3100, 3101, 3102],
          [3110, 3111, 3112]]]], dtype=torch.int32)

Now we apply the flatten():

sampleFlatten = torch.flatten(sample4, start_dim=1)
print(f"sampleFlatten =\n{sampleFlatten}")
print(f"sampleFlatten.shape =\n{sampleFlatten.shape}")

sampleFlatten =
tensor([[   0,    1,    2,   10,   11,   12,  100,  101,  102,  110,  111,  112],
        [1000, 1001, 1002, 1010, 1011, 1012, 1100, 1101, 1102, 1110, 1111, 1112],
        [2000, 2001, 2002, 2010, 2011, 2012, 2100, 2101, 2102, 2110, 2111, 2112],
        [3000, 3001, 3002, 3010, 3011, 3012, 3100, 3101, 3102, 3110, 3111, 3112]],
       dtype=torch.int32)
sampleFlatten.shape =
torch.Size([4, 12])

So the flattening method works the same way with that method. We get 4 rows with consistent first dimensions and the other dimensions are “unfurled” in the order 3 - 2 - 1.

So either method works and turns out to give the same results.

1 Like

hey @paulinpaloalto

thank you again for that stepwise explanation using different functions explaining flattening with -1 as sample dimension (I knew it is sample dimension and that is why my confusion arose in first place as I knew it’s sample size but now when you explain the dimension and vector into column matrix, I got my doubt cleared, basically this way the flattening layer which converts 2D tensor to 1D tensor is converting the sample into column vector as the example i mentioned actually going to train the samples on different learning rate scheduler, so this -1 kind of creating column wise batches to be trained with each learning rate.

like converting into mini batches and testing all samples on different hyperparameter tuning thereby improving training time as well as better use GPU.

Thank you both of you @balaji.ambresh and @paulinpaloalto for resolving my doubt.