Alpaca model

I am not clear on why we are using the linear regression in the output/top layer in the final alpaca model. Can someone please explain.

For mathematical convenience, we use a linear output but then tell the optimizer to use categorical crossentropy and “from_logits = True”, which creates a softmax layer in the output automatically.

(If I remember the details correctly, I can’t look it up at the moment)

Due to some characteristics of how TensorFlow works, using this method is more mathematically efficient and accurate.