Although I understood the -1 is related to batch size creation for fully connected layers, still a bit confused

We are assuming that the first dimension is the “samples” dimension, right? So aren’t they the same thing? Meaning that the size of the first dimension (index 0 in python) is the number of samples. So it’s the size of a dimension. And the -1 just means “use whatever the size is”, meaning that the code works for any batch size.

You don’t need to specify the output size, since it’s determined by the sizes of all the dimensions other than the first dimension (samples dimension). So it’s a very nice and general way to write the code.

We can construct some experiments using torch.Tensor.view and torch.flatten, similar to this post about numpy np.reshape.

I will play around with this, but it may take me a few hours (have some real life to take care of :grinning_face:). Stay tuned!