I’ve created for myself the most basic classification task.
The input has 2 training examples with one feature as [[-1], [1]] and the outputs are [1, 0].
I’ve created Neural Network with 1 neuron with Linear activation function and the model’s uses Binary crossentropy loss.
If I am correct, this task should be fairly easy to train to 100%, meaning I can get loss equal to 0 (altough accuracy is 100%). But this model is unable to overfit the training data to have 0 loss (or predict probability exactly 1 for ‘-1’ and 0 for ‘1’).
Am I missing something? Or why doesn’t it converge to 0?
If I make from this task Linear regression task by switching the loss from BinaryCrossentropy to MeanSquaredError, the model then converges to zero.
Yes, I was.
I needed way higher learning rate (2.55).
But this did not work using tf.float32 and tf.float64, there I needed some changes as follows:
I’ve created new model with 1 hidden layer with 6 neurons (ReLU activation) and set pretty high learning rate (1.29) and was able to train the NN. Although it depends on the parameter’s initialization.
In one run I an able to train the NN in just 10 epochs, in another I wasn’t able (Tried 300 000 epochs and possibly it would learn with far more).
But still wondering why I am not able to train these 2 scenarios the same way as with tf.float16, meaning with just output layer…
Just FYI, I used your data set of two examples, using both logistic regression and an NN with one hidden layer, implemented in a different toolset, and it converged very quickly.
implemented in a different toolset, and it converged very quickly
What toolset did you use? Still I am not sure why I can’t train on the Logistic regression with different data type such as tf.float32 and tf.float64. I know that the updates will be small, but in my cases it gets stuck and don’t change at all…