@lucas.coutinho
can you check into learner’s raised query as the sentence explanation in the assignment looks more confusing between expectations and being right with error exception explanation more confusing.
When a pooling layer operates on a CNN layer output of shape (batch_size, channels, height, width), such as (64, 32, 14, 14):
64 is the batch size.
32 is the number of channels (feature maps).
14, 14 are the height and width of each feature map.
A pooling layer, for example, a 2x2 max pooling with a stride of 2, would reduce the height and width by half. In this case, the 14x14 feature maps would become 7x7. The number of channels (32) and the batch size (64) would remain unchanged. Therefore, the output shape after this pooling layer would be (64, 32, 7, 7).
The error in the input shape of fc1.linear occurs because fully connected (linear) layers expect a 1D input for each sample in the batch. The output of the pooling layer, (64, 32, 7, 7), is a 4D tensor. Before passing this to fc1.linear, it needs to be flattened.
Solution To resolve this, the output of the pooling layer must be reshaped or flattened.
Flattening the spatial dimensions and channels: into a single dimension for each sample in the batch.
The resulting shape would be (batch_size, channels * pooled_height * pooled_width).
For the example output of (64, 32, 7, 7), the flattened size for each sample would be 32 * 7 * 7 = 1568. The input shape for fc1.linear would then be (64, 1568) which then can be initialize the fc1.linear layer with nn.Linear(1568, output_features).
Regards
DP