# First binary classification model

Why we did set 0.5 as threshold to classify that is cat or non-cat
and if i have more than one class, how can I set the thresholds

There are two types of classifications, i.e, Binary classification and Multi-class classification. In the case of binary classification, we can only support true or false, like cat or non-cat. But, output from a neural network may have broader values like -1000, 30, 4000, etc. So, as you see in your assignment, Sigmoid function is applied. Sigmoid is very useful function to convert output from a neural network into the value between 0 and 1. So, we can just specify 0.5 as a threshold.

If we have multiple classes, i.e, multi-class classification, we usually prepare multiple neurons as an output layer. Each correspond to output class, like â€ścatâ€ť, â€śdogâ€ť, etc. Then, we usually apply Softmax function to summarize likelihood of each class.

Hope this helps.

1 Like

alright, if my output values have no threshold, so the model will not be classification,
in this case which activation function should I use for the output layer as well as within the hidden layer?

There are different types of deep learning tasks. Typical ones are, binary classification, multi-class classification, logistic regression, clustering, and so on. You know that deep learning is used for machine translation, image classification, object detection, etc, as applications. We need to select appropriate activation functions for those.

It may be slightly difficult to say that which activation function should be used for what, but each has different characteristics. So, there are some recommended ways to use particular activation functions.

In the case of classifications, Softmax is mostly used for multi-class classification and sigmoid for binary classification for the output layer as we discussed.
For for hidden networks, we need to consider what is the characteristics of input data, how we can enhance linear functions (like Dense) with adding non-linear activation, and also how we can keep â€śgradientsâ€ť for backpropagation that you will learn in the later lecture. Sigmoid is typically not used in a hidden layer, since its gradient is small, which potentially results in â€śvanishing gradientâ€ť that is the big problem for optimizing a network. Tanh has more gradients than Sigmoid. So, it may be used in some cases. ReLU is one of popular activation functions, since its nature. ReLU cuts off minus data, and picks up plus-data as is. So, it is easy to get stable derivatives, which is very good for backpropagation. In this sense, ReLU is mostly used in a hidden layer. Due to its simpleness, it is not used for the output layer mostly. If minus values have important meanings, and output from neurons should keep those, then, we can not use ReLU since it cuts off minus value.

There are lots of activation functions like this. You do not need to go in detail, but will learn some key activation functions in the series of assignments in this specialization.

This is just an introduction, but hope this helps some.