U-net exercise: a doubt about image shapes

Hi Folks,
there is a detail in the notebook “Image segmentation with U-net” that is confusing me a bit. The shape of images and masks, why are they (480, 640, 4)? In particular, why the third dimension is 4 and not 3 for RGB’s 3 entries?

And again about dimensions, why is there the need of rescaling the images and masks to (96, 128)? What does it exactly mean?

Thank you for hints!

Please point me to the section in code where you see (480, 640, 4).

The reason for reshaping images and masks to (96, 128) is to be compatible with the model.
Look at the function unet_model. The very first step is to define the input layer. input_shape specifies this dimension of what the model should accept.