First binary classification model

IZZETTIN_ALHALIL · July 12, 2022, 12:43pm

Why we did set 0.5 as threshold to classify that is cat or non-cat
and if i have more than one class, how can I set the thresholds

thanks in advance

anon57530071 · July 12, 2022, 1:02pm

There are two types of classifications, i.e, Binary classification and Multi-class classification. In the case of binary classification, we can only support true or false, like cat or non-cat. But, output from a neural network may have broader values like -1000, 30, 4000, etc. So, as you see in your assignment, Sigmoid function is applied. Sigmoid is very useful function to convert output from a neural network into the value between 0 and 1. So, we can just specify 0.5 as a threshold.

If we have multiple classes, i.e, multi-class classification, we usually prepare multiple neurons as an output layer. Each correspond to output class, like “cat”, “dog”, etc. Then, we usually apply Softmax function to summarize likelihood of each class.

Hope this helps.

IZZETTIN_ALHALIL · July 12, 2022, 1:27pm

thanks very much for your reply
alright, if my output values have no threshold, so the model will not be classification,
in this case which activation function should I use for the output layer as well as within the hidden layer?

anon57530071 · July 12, 2022, 2:24pm

There are different types of deep learning tasks. Typical ones are, binary classification, multi-class classification, logistic regression, clustering, and so on. You know that deep learning is used for machine translation, image classification, object detection, etc, as applications. We need to select appropriate activation functions for those.

It may be slightly difficult to say that which activation function should be used for what, but each has different characteristics. So, there are some recommended ways to use particular activation functions.

In the case of classifications, Softmax is mostly used for multi-class classification and sigmoid for binary classification for the output layer as we discussed.
For for hidden networks, we need to consider what is the characteristics of input data, how we can enhance linear functions (like Dense) with adding non-linear activation, and also how we can keep “gradients” for backpropagation that you will learn in the later lecture. Sigmoid is typically not used in a hidden layer, since its gradient is small, which potentially results in “vanishing gradient” that is the big problem for optimizing a network. Tanh has more gradients than Sigmoid. So, it may be used in some cases. ReLU is one of popular activation functions, since its nature. ReLU cuts off minus data, and picks up plus-data as is. So, it is easy to get stable derivatives, which is very good for backpropagation. In this sense, ReLU is mostly used in a hidden layer. Due to its simpleness, it is not used for the output layer mostly. If minus values have important meanings, and output from neurons should keep those, then, we can not use ReLU since it cuts off minus value.

There are lots of activation functions like this. You do not need to go in detail, but will learn some key activation functions in the series of assignments in this specialization.

This is just an introduction, but hope this helps some.

IZZETTIN_ALHALIL · July 12, 2022, 2:32pm

thanks for your clarification
it is highly appreciated

Kic · July 12, 2022, 2:46pm

@anon57530071 has given a very good and broad coverage on activation functions. Just would like to add an extra point here, reLu’s formula f(x) = max(0,x) is simple and less computational demanding, comparing to sigmoid and tanh.

Topic		Replies	Views
Why sigmoid is used for binary classification in vgg16 Convolutional Neural Networks	6	576	November 5, 2022
Why is sigmoid activation function better for binary classification than the tanh activation function Improving Deep Neural Networks: Hyperparameter tun	2	667	September 21, 2021
Week3 - Choice of Activation function Neural Networks and Deep Learning	2	739	February 5, 2022
Sigmoid vs Softmax Convolutional Neural Networks in TensorFlow week-1	2	575	May 4, 2022
Why ReLU and softmax? NLP with Probabilistic Models week-4	1	595	November 2, 2021

First binary classification model

Related topics