Hi,
I understand TopK and Top P sampling when applied one at a time. But when both are non zero how does the model select the output?
Then we should only be sampling from those that satisfy both the topK and the topP criteria.
Hi,
I understand TopK and Top P sampling when applied one at a time. But when both are non zero how does the model select the output?
Then we should only be sampling from those that satisfy both the topK and the topP criteria.