Hi, For the alpaca model , the activation function used is ‘linear’. Is there any reason why we have not used ‘sigmoid’?
Because
model2.compile(optimizer=tf.keras.optimizers.Adam(lr=base_learning_rate),
loss=tf.keras.losses.BinaryCrossentropy(from_logits=True),
metrics=['accuracy'])
we specify from_logits=True here, which effectively means you end up using the sigmoid activation function in the last layer, but it is optimized.
Yes, I understood it now. Thanks!