CNN: Complex loss and last activation functions

Simple case: cats and dogs example
Input shape (150,150,3) → CONV layers… → Flatten() → Dense(512, activation= ‘relu’) → Dense(1, activation=‘sigmoid’"
Loss = binary_crossentropy

Complex case and question
Input shape (150,150,75)
Output shape (150,150, 500)

  • The output shape has the same height and width than the input shape
  • The model will be trained with this output shape which is a grid of width 150 and depth 150. The number of channel is 500: It is the number of potential species that could be observed within each cell or pixel.
  • The final objective for this neural network would be to get a matrix of 150x150, with channel of 500 being the probabilities of each species presences within each pixel or cell.

To summarize:
Input shape (150,150,75) → CONV layers → … ? … → Output shape (150, 150, 500)


  • What kind of loss function could be used ?
  • Should flatten be used ?
  • What could be the last activation function ?

Thank you for any idea or suggestion

If you did use flatten in the model, please click my name and message your notebook as an attachment.

Hello @balaji.ambresh,

I did not use flatten yet.

I am just trying to define the architecture I should use (loss / activation function) to be able to output this shape (150,150,500).

The advantage of having this output shape is for example if I want to get all cell (or pixel) probabilities for species 1, I will just extract the first channel on the 150x150 output grid.

Maybe a way would be to flatten for the last layer height and width → 150 x 150

  • Which means that the output shape instead of shape (150, 150, 500) would be (22500, 500)
  • But I still can’t figure out what kind of loss and last activation function I should use, given the fact that there could be more than 1 species out of the 500 species in the same cell or pixel

This topic of semantic segmentation is covered in week 3 in course 4 in deep learning specialization. I can’t find the coursera video on youtube. Please search for it or even better take the course. The NN is called UNet.

Please see this link as well.

1 Like

Thanks a lot for this excellent input @balaji.ambresh!

As a matter of fact I took the course but didn’t make the link, because the content of my input and output channels are different (150x150x75 versus 150x150x500)

  • Input channels are environmental variables (75, like water,forest, etc.)
  • Output channels are species, as you suggested, a true mask of depth 500 different species, with some being present in a pixel and some not
  • And as you suggested, the objective would be to get the predicted maks with UNet

Would it then be all right to simply add a few layer at the end of the UNet to get the correct output shape or would I be distorting too much the intent of the UNet?

You’re welcome.

Since you’ve taken the course, one thing to notice is that the unet implementation on paper is different from the assignment. In the assignment, input shape is (96, 128, 3) and output shape is (96, 128, n_classes).

Given the match in first 2 dimensions, I’m in favor of creating a custom unet / reshaping input to match unet is than adding last few layers to get to the original width and height.

Why don’t you try both and reply to this thread with the results?


1 Like

Awesome, @balaji.ambresh,

Sure no problem, I will try both and reply to the thread. Give me a few months, as I have ongoing projects I need to finish first.

Looking forward trying this, thanks again for your great advices!