- From this :

How are we obtaining this? i.e. how is y and (1-y) computed (there is explanation on how to calculate h(x) and (1-h(x)) however, I am unable to understand how to compute y and (1-y):

- And what is the value of alpha?

- From this :

How are we obtaining this? i.e. how is y and (1-y) computed (there is explanation on how to calculate h(x) and (1-h(x)) however, I am unable to understand how to compute y and (1-y):

- And what is the value of alpha?

The y^{(i)} values are the “labels” on the data. They are just given to you as input. Since this is a binary classification problem, each y^{(i)} is either 0 or 1, so only one of those terms in the loss function will be non-zero for each input sample.

The learning rate \alpha is a hyperparameter, meaning a value that you simply choose. It controls how big a “step” you take on each iteration of gradient descent: how far you go in the direction of the current value of the gradients. There is no magic way to know what the right value is *a priori* in any given problem. You have to run some experiments to figure out what works. Here they have already figured that out for us and just given us a value in the test case. But with that said, note that they are just showing us the simplest version of Gradient Descent with a fixed learning rate. There are more complex algorithms that can dynamically adjust the learning rate, once we get to the level of sophistication of using TensorFlow or PyTorch or one of the other ML platforms to implement things.