The cost function for logistic regression is -y*log(y_) - (1-y)*log(1-y_).
But for the problem of Language Model & sequence generation (precisely at time 10:26), it was told that cost is just -y*log(y_).

Isn’t second term needed?

*log(1-y_).
But for the problem of Language Model & sequence generation (precisely at time 10:26), it was told that cost is just -y*log(y_).

Isn’t second term needed?

The two forms are equivalent, except one uses probabilities and the other uses discrete values.