Hi @Barb
As you have seen in the convTranspose2d operation, the height and width of the tensor are not passed directly. The reason is that the height and width of the output tensor are determined using other parameters such as padding, stride, and kernel shape.
The height and width of a tensor after a normal convolution operation are calculated using the formula (there is a different formula for conv2dtranspose) given below:
New_Tensor_Height = (Current_Tensor_Height + 2 * Padding - Kernel_Height) / Stride + 1
New_Tensor_Width = (Current_Tensor_Width + 2 * Padding - Kernel_Width) / Stride + 1
For example, you have an input tensor of shape (1, 1, 50, 128) and now let us say that you applied nn.Conv2d on top of it with the following parameters:
input_channels = 1
output_channels = 10
kernel_size = 3
padding = 2
stride = 2
Then the new height and width of the tensor will be calculated as follows:
New_Tensor_height = (50 + 2 * 2 - 3) / 2 + 1 = 26.5 => 26 (dims can’t be float)
New_Tensor_width = (128 + 2 * 2 - 3) / 2 + 1 = 65.5 => 65 (dims can’t be float again)
So, the shape of the output tensor will be (1, 10, 26, 65). Run the following code to see our experiment in action.
import torch
input_tensor = torch.ones([1, 1, 50, 128])
conv_layer = torch.nn.Conv2d(in_channels = 1, out_channels = 10, kernel_size = 3, stride = 2, padding = 2)
output_tensor = conv_layer(input_tensor)
# The output will be: torch.Size([1, 10, 26, 65])
print(output_tensor.shape)
Now let us understand the channels. Pytorch general format of tensor is [N, C, H, W] i.e. [batch_size, Number of Channels, Height, Width]. There are no restrictions on the number of channels. You can create a tensor of as many channels as you want. For instance, a color image has 3 channels that are R, G, and B. A grayscale image has only a single channel. You can create your own tensor with shape [1, 16, 50, 128] that has got 16 channels.
So you only need to tell the conv_layer about the number of channels before and after the operation. In the previous example, you have created conv2d_layer which will take an input tensor that has only 1 channel and the convolution operation will define and use 10 kernels (the out_channels determines the number of kernels to be used for convolution) for convolving with the input to generate the output tensor of 10 channels.