Binary classification - Simplest classification task

I’ve created for myself the most basic classification task.

The input has 2 training examples with one feature as [[-1], [1]] and the outputs are [1, 0].
I’ve created Neural Network with 1 neuron with Linear activation function and the model’s uses Binary crossentropy loss.

Here’s the code of the described example.

features = np.array([[-1], [1.]], dtype=np.float16)
labels = np.array([1., 0.], dtype=np.float16)

model = tf.keras.models.Sequential([
    tf.keras.layers.Input(shape=(1,)),
    tf.keras.layers.Dense(1, activation='linear', dtype=tf.float16)
])

model.compile(
    optimizer=tf.keras.optimizers.Adam(1e-2),
    loss=tf.keras.losses.BinaryCrossentropy(from_logits=True),
    metrics='accuracy'
)

model.fit(features, labels, None, 100000)

If I am correct, this task should be fairly easy to train to 100%, meaning I can get loss equal to 0 (altough accuracy is 100%). But this model is unable to overfit the training data to have 0 loss (or predict probability exactly 1 for ‘-1’ and 0 for ‘1’).
Am I missing something? Or why doesn’t it converge to 0?

If I make from this task Linear regression task by switching the loss from BinaryCrossentropy to MeanSquaredError, the model then converges to zero.

A single dense layer with no hidden layer isn’t really an NN. It is just regression.

Try it without the logits, try different learning rate or iterations.

Yeah I did try a lot of things, but I wasn’t able to learn the regression.

Sorry, but i am currently on leave, and do not have ability to try your experiment.

@Michal_Majk_Ritcherd, were you able to make progress on this experiment?

Yes, I was.
I needed way higher learning rate (2.55).

But this did not work using tf.float32 and tf.float64, there I needed some changes as follows:
I’ve created new model with 1 hidden layer with 6 neurons (ReLU activation) and set pretty high learning rate (1.29) and was able to train the NN. Although it depends on the parameter’s initialization.
In one run I an able to train the NN in just 10 epochs, in another I wasn’t able (Tried 300 000 epochs and possibly it would learn with far more).

But still wondering why I am not able to train these 2 scenarios the same way as with tf.float16, meaning with just output layer…

That’s not unusual for neural networks. Their cost function is not convex, so you can get a local minimum.

What weight and bias values did you get when it did converge?

Just FYI, I used your data set of two examples, using both logistic regression and an NN with one hidden layer, implemented in a different toolset, and it converged very quickly.

When it converged, I got weight = [[-70.1]] and bias [-16.4]. With another run I get another different params, which is expected.

implemented in a different toolset, and it converged very quickly

What toolset did you use? Still I am not sure why I can’t train on the Logistic regression with different data type such as tf.float32 and tf.float64. I know that the updates will be small, but in my cases it gets stuck and don’t change at all…

Anyway thanks for your time.

I don’t think the size of the float data has anything to do with your issue.

I agree with you, maybe I have bug somewhere in my code :smiley: