Dear friend, mentor,
I already finished the hw of Image_segmentation_Unet_v2. But I still have some questions in detail about the data. This may take you a little bit of time. Thank you in advance.
Q1. why data has 4 CH ? like the code below. The original img, mask all has 4 CH? I thought a regular pic only has 3 RGB CH?
N = 2
#img and mask size (480,640,4)
img = imageio.imread(image_list[N])
mask = imageio.imread(mask_list[N])
#mask = np.array([max(mask[i, j]) for i in range(mask.shape[0]) for j in range(mask.shape[1])]).reshape(img.shape[0], img.shape[1])
fig, arr = plt.subplots(1, 2, figsize=(14, 10))
arr[0].imshow(img)
arr[0].set_title('Image')
arr[1].imshow(mask[:, :, 0])
arr[1].set_title('Segmentation')
Q2, the original img and mask is (480,640,4). But this Unet has 23 classes. The mask has 4 classes ? So, my understanding is the training mask data has 4 classes, but the final unet outcome can do 23 classes? I am confused here.
Q3. If a pic is made from 3CH RGB, what does a mask make from ? I think is the #CH. maybe 1st ch segment the road, 2nd ch do tree, 3ch do sky… If you plot the code imshow(mask[:, :, 0]) , the mask looks “reasonable”, but if I plot all CH, like imshow(mask), this pic is black. If I plot other ch, but not 0, then the pic is just dark purple. I am not following the idea of all black (all CH), or all purple (CH1 or 2 or 3)
Q4. This is a silly question. lol. The “processed_image_ds” is the data (img, and mask). But which line of the code in the hw tells the U-Net, img is the data, and mask is the label ? It could be another way, right…
def process_path(image_path, mask_path):
img = tf.io.read_file(image_path)
img = tf.image.decode_png(img, channels=3)
img = tf.image.convert_image_dtype(img, tf.float32)
mask = tf.io.read_file(mask_path)
mask = tf.image.decode_png(mask, channels=3)
mask = tf.math.reduce_max(mask, axis=-1, keepdims=True)
return img, mask
def preprocess(image, mask):
input_image = tf.image.resize(image, (96, 128), method='nearest')
input_mask = tf.image.resize(mask, (96, 128), method='nearest')
return input_image, input_mask
image_ds = dataset.map(process_path)
processed_image_ds = image_ds.map(preprocess)
train_dataset = processed_image_ds.cache().shuffle(BUFFER_SIZE).batch(BATCH_SIZE)
model_history = unet.fit(train_dataset, epochs=EPOCHS)
#how did unet know, in the train_dataset, img is the data, and mask is the label?
Q5. this is really a critical question, hope I can put it here. In real world, how do you make those mask, how do you create those training data? Let’s assume humans is having some super alien virus (very new, nobody has seen it before), now, I want to do this segmentation in x-ray pic to “draw” the area of infected area. So, how you gonna do it? Ask a human, and hand draws this segmentation one by one, pic by pic? That takes forever…
