Why not use y{-1,1} and skip sigmoid all together?

In one of the videos in week 3 Prof. Ng says that he only uses sigmoid when the case is a binary classification with y_train{0,1}, but that sigmoid is not as good as tanH. Why not then simply use y_train{-1,1} and use the tanH in the last step?

Hi @G11,

I believe Andrew shares the recommended ways in his lectures. You can explore different ways as well, and share results.