The description of loss function when sample y^(i) = 1, the expected predicted value of y_hat^(i) should be close to 1, but the the loss function value is expected to be 0 instead of 1 (as below)

I think you are right that the way they said that is confusing. They are saying that \hat{y}^{(i)} should be close to 1, which will then make -log(\hat{y}^{(i)}) close to 0.