why don‘t we delete output layer, while we set the activation function of that to be linear in order to reduce error by deliver model.complie z directly
We still need the output layer weights.
The output layer in a multiclass classification network plays a critical role in transforming the learned features from the hidden layers into the final predictions (class probabilities). Thus, the output layer’s weights are essential for learning the correct mapping from features to class predictions. They are trainable parameters that fine-tune the performance of the model.
If you suggest passing raw logits (denoted by z
) directly to model.compile
for training, this could be problematic because for classification tasks, raw logits must be transformed into probabilities using a softmax or sigmoid function, which facilitates training by working with interpretable loss functions such as categorical cross-entropy. Without a proper transformation, the loss calculation can result in large gradients or poorly scaled outputs, making it difficult for the optimizer to converge.