Hi, I need some clarification regarding the Week 2 Logistic Regression Cost Function Video.
The video started by interpreting \hat{y} as a probability, P(y=1|x), and continue to define P(y|x) as the following:
\begin{equation}P(y|x) = \begin{cases}\hat{y} &y = 1\\1-\hat{y} &y = 0\end{cases} \end{equation}
Professor Ng then wrote the same definition in the closed form: P(y|x) = \hat{y} ^ y (1-\hat{y}) ^ {1-y}
My first question would be that it would be possible to write P(y|x)'s closed form some other way as well right? For example P(y|x) = \hat{y}y + (1-\hat{y})(1-y) would still be valid for y=1 and y=0. Is the choice of the closed form because how easy it is to calculate the derivative, or how easy to use gradient descend?
(I tried searching around and found that the closed form is the usual one that is associated with a Bernoulli distribution’s Probability Mass Function but I didn’t find out why this closed form is preferred.)
My second question would be what does it mean to optimize P(y|x)? I take P(y|x) to mean that the probability of y given x, but since there is always a y given x, isn’t P(y|x) just 1? Professor Ng talked about optimizing P(y|x) by finding the maximum, is this because the maximum of any probability would be 1 and so we should maximize the P(y|x)?
Thanks