Confusion about improvements made to reduce numerical roundoff error

why don‘t we delete output layer, while we set the activation function of that to be linear in order to reduce error by deliver model.complie z directly

We still need the output layer weights.

1 Like

The output layer in a multiclass classification network plays a critical role in transforming the learned features from the hidden layers into the final predictions (class probabilities). Thus, the output layer’s weights are essential for learning the correct mapping from features to class predictions. They are trainable parameters that fine-tune the performance of the model.

If you suggest passing raw logits (denoted by z) directly to model.compile for training, this could be problematic because for classification tasks, raw logits must be transformed into probabilities using a softmax or sigmoid function, which facilitates training by working with interpretable loss functions such as categorical cross-entropy. Without a proper transformation, the loss calculation can result in large gradients or poorly scaled outputs, making it difficult for the optimizer to converge.

1 Like