I am not clear on why we are using the linear regression in the output/top layer in the final alpaca model. Can someone please explain.
For mathematical convenience, we use a linear output but then tell the optimizer to use categorical crossentropy and “from_logits = True”, which creates a softmax layer in the output automatically.
(If I remember the details correctly, I can’t look it up at the moment)
Due to some characteristics of how TensorFlow works, using this method is more mathematically efficient and accurate.
2 Likes