I am really struggling to understand why we set the output layer to linear when we add the flag from_logits=True in the improved implementation of softmax regression, and later we apply the softmax regression to the model with this code:

logits = model(X)
f_x = tf.nn.softmax(logits)

How is this conceptually the same thing as the previous implementation?

When you use a linear output and the correct loss function, and specify from_logits = True, then TensorFlow automatically uses a pre-implemented version of softmax that is computationally very efficient.

Thank you for your kind response, but I am still a bit unclear about this. How does it know it’s supposed to use softmax when we have made no specification and we are using a linear output layer? Without softmax, wouldn’t the model be outputting something else entirely?