A bit confused about how logistic regression works … So it finds a point z which is derived from z = wTx + b? Then it finds a for each point and calculates the loss/cost/gradient descent functions? Based on the results for these functions, it changes the position of the best fit line so everything is optimized? Is this an accurate general description of how it works?
Thanks, in advance
Dear @Neeral_Bhalgat,
The question seems general but carries all the functionality of a neural network learning. So, let me try to explain it in detail.
Prof Ng has clearly mentioned that Logistic Regression is nothing but an approach to classification problems, not regression problems. Since, we know that Linear Regression is not apt for modeling binary outcomes between 0 and 1, we generally use the principle of Logistic Regression. The binary outcome here means when data is linearly separable but the outcome is dichotomous in nature.
Now, let us see how logistic regression work? To implement logistic regression, we will consider a model having a predictor “x”, another variable “ŷ” (a Bernoulli-based response) and “p” as the probability of ŷ=1.
The linear equation could be written in the following ways:
p = b0+b1x
You can find here that the right-hand side of the above equation holds the values exceeding the range 0,1. To minimise this, we will predict the odds by dividing the probability of the occurrences over the probability of events that aren’t not happening -----> odds = p/(1-p).
The new equation will appear as:
p/(1-p) = b0+b1x
Thus, writing for a logistic model, the expression could be written as: p = 1/ (1 + e-(b0+b1x1+b2x2+b3x3±—+bnxn) by using a couple of mathematical calculations like inverse rule of algorithms, algebric manipulations, division and multiplications.
This expression would represent “the sigmoid function” that would help in mapping any predicted values of probabilities in the range of 0 and 1.
You can find the difference between a linear regression based model and a logistic regression based model here,
A logistic regression model would somewhere look like the below picture.
Hence, a linear equation (z) gives itself to a sigmoidal activation function(σ) so as to predict the value of (ŷ).
In general, in order to evaluate the functionality of a model, we calculate the loss function. In logistic regression, since the outcome is binary, we use cross entropy loss function instead of mean squared error, which is basically used in linear regression based model.
We have to remember here that the cross entropy function measures the functionality of a classification model where the output is a probability value.
The logistic regression is deeply linked to the functioning of the neural network where each of the neurons in the network could be considered as a logistic regression with specific inputs, weights and bias. What we do is, we conduct a dot product on all of them before the application of any non-linear functions.
In the figure below, the last layer of the neural network is mostly a basic linear model. So, we have the input which is the hidden layer, the weights, a dot product and finally a non-linear function. The first part of the neural network on the left-hand side provides the data representation which would further help in doing the linear classification/regression in the second part of the neural network.
Thus, by minimising the loss function, the model gives the best improved performance. This is how a model is optimised to make itself fully functional.
Now, you can have a good idea on how logistic regression works with weights matrices and bias vectors through sigmoidal calculations.
Thank you! This makes so much more sense. Just to confirm @Rashmi, the position of the sigmoid function is changed based on the loss/cost functions so it can minimize all these values?
Hi @Neeral_Bhalgat, yes.
They both are in one or the other ways related each other. A sigmoid neuron makes use of loss function, which in this case is cross-entropy function. By changing the values of ‘w’ and ‘b’, we can get various sigmoid functions and thus, we change the slope of the curve. We normally start from a random sigmoid function & gradually reach the desired sigmoid function for which the loss is going to be minimal.
How can I practice these concepts? Having it explained is great but I never remember anything unless I can use it. Any suggestions of how, where and what to do that with? Not just Logistic Regression but all of the algorithms that are used in this course.
Thanks.
Hi, Matthew Woolhouse.
You are absolutely right!
You can create your own dataset or get a preferred dataset that is easily available to you and practice the things that you’ve learned till now in the DL specialization. There are various platforms like Kaggle (a good to go for the beginners) that could help you out with certain projects; you can pullout requests at Github to explore on projects being handled by the professionals; look out for already built-in projects that are available on open access to explore more. Besides, you can practice with the same datasets by opening a new Jupyter notebook on Coursera, I don’t think that the team will ever charge if you try to hone your skills and want to practice until your subscription is on for the defined time period.
Insights from other mentors are always welcomed!