Understanding one hot output shape

Hi

Could someone please explain why the ground truth patch of 160x160x16, having 4 channels each, gets converted to a 3x160x160x160?
From my understanding, each voxel can be 4 different classes, so shouldnt the one hot output be 160x160x16x4, similar to the example in the lab on sub-sampling?

I figured it out, there is a typo in the notebook, where it is written as 3x160x160x160, it should have been 3x160x160x16

1 Like

Thank you for letting us know @Jairaj_Mathur!

1 Like

Hi @Jairaj_Mathur

The ground truth patch of 160x160x16, having 4 channels each, gets converted to a 3x160x160x160 because the original patch has 4 channels, representing the 4 different classes, and the output of the one-hot encoding process is a 3D volume with 3 channels, one for each class.

One-hot encoding is a process used to convert a categorical variable into a numerical variable. It creates a new binary column for each unique category in the original column. The new columns contain 1s and 0s, where 1 represents the presence of that category in the original column, and 0 represents its absence. In the case of the ground truth patch, it has 4 channels, representing 4 different classes. After one-hot encoding, each voxel is represented by 3 binary channels (0 or 1) indicating the presence or absence of that class.

The original shape of the ground truth patch is (160, 160, 16, 4), where the last dimension represents the 4 channels, and the first three dimensions represent the spatial dimensions of the patch. After one-hot encoding, the shape of the output patch is (3, 160, 160, 16), where the first dimension represents the 3 channels of the one-hot encoded output and the last three dimensions represent the spatial dimensions of the patch.

Regarding the lab on sub-sampling, it is true that the shape of the output should be 160x160x16x4, but it seems that the lab is using one-hot encoding to convert the output to a 3D volume, with one channel for each class. This is done for the ease of visualization and for the compatibility with the 3D convolutional layers, which are typically used to process 3D data.

In summary, the ground truth patch of 160x160x16, having 4 channels each, gets converted to a 3x160x160x160 because each voxel is represented by 3 binary channels (0 or 1) indicating the presence or absence of that class, and this is done for the ease of visualization and for the compatibility with the 3D convolutional layers which are typically used to process 3D data.

Hope so this answers your question

Regards
Muhammad John Abbas