First binary classification model

Why we did set 0.5 as threshold to classify that is cat or non-cat
and if i have more than one class, how can I set the thresholds

thanks in advance

There are two types of classifications, i.e, Binary classification and Multi-class classification. In the case of binary classification, we can only support true or false, like cat or non-cat. But, output from a neural network may have broader values like -1000, 30, 4000, etc. So, as you see in your assignment, Sigmoid function is applied. Sigmoid is very useful function to convert output from a neural network into the value between 0 and 1. So, we can just specify 0.5 as a threshold.


If we have multiple classes, i.e, multi-class classification, we usually prepare multiple neurons as an output layer. Each correspond to output class, like “cat”, “dog”, etc. Then, we usually apply Softmax function to summarize likelihood of each class.

Hope this helps.

1 Like

thanks very much for your reply
alright, if my output values have no threshold, so the model will not be classification,
in this case which activation function should I use for the output layer as well as within the hidden layer?

There are different types of deep learning tasks. Typical ones are, binary classification, multi-class classification, logistic regression, clustering, and so on. You know that deep learning is used for machine translation, image classification, object detection, etc, as applications. We need to select appropriate activation functions for those.

It may be slightly difficult to say that which activation function should be used for what, but each has different characteristics. So, there are some recommended ways to use particular activation functions.

In the case of classifications, Softmax is mostly used for multi-class classification and sigmoid for binary classification for the output layer as we discussed.
For for hidden networks, we need to consider what is the characteristics of input data, how we can enhance linear functions (like Dense) with adding non-linear activation, and also how we can keep “gradients” for backpropagation that you will learn in the later lecture. Sigmoid is typically not used in a hidden layer, since its gradient is small, which potentially results in “vanishing gradient” that is the big problem for optimizing a network. Tanh has more gradients than Sigmoid. So, it may be used in some cases. ReLU is one of popular activation functions, since its nature. ReLU cuts off minus data, and picks up plus-data as is. So, it is easy to get stable derivatives, which is very good for backpropagation. In this sense, ReLU is mostly used in a hidden layer. Due to its simpleness, it is not used for the output layer mostly. If minus values have important meanings, and output from neurons should keep those, then, we can not use ReLU since it cuts off minus value.

There are lots of activation functions like this. You do not need to go in detail, but will learn some key activation functions in the series of assignments in this specialization.

This is just an introduction, but hope this helps some.

thanks for your clarification
it is highly appreciated :rose:

@anon57530071 has given a very good and broad coverage on activation functions. Just would like to add an extra point here, reLu’s formula f(x) = max(0,x) is simple and less computational demanding, comparing to sigmoid and tanh.

1 Like