LLM sequence probabilities

I have written a library for controlling LLM’s using a languages definition.

This is similar to using context free grammars to control LLM’s, except my grammars are not context free.

Now I want to do stuff with probabilities of tokens and sequences of tokens.

I consider adding a new feature to the grammar specification.
The idea is to have a way to adjust the probabilities of the various grammar branches. For example ( in pseudo notation ) a rule X:

X : Yes | No

This means that you except either ‘Yes’ or ‘No’ and nothing else. What I want to do is to be able to adjust the probabilities of the branches.

For example:

X : Yes * 0.5 | No

would mean that ‘Yes’ is now half as probable as ‘No’, in a “completely neutral context”. So the LLM would need the probability of ‘Yes’ to be twice as high as ‘No’ for it to be choose over ‘No’ as the next sequence of tokens. The idea is basically that LLM would have to be very certain that ‘Yes’ is the right word to continue with for it to choose that. In other words it would err on the side of caution, choosing ‘No’ in this example.

I understand that there are a couple of problems with this.

First of all converting logits to probabilities could loose a lot of precision, so even if the math was correct, doing this by looking at probabilities could possibly not work.

Secondly, I’m not sure we’re really talking about probabilities here. It may be that the selection of the next token is a matter of sorting order that doesn’t map to actual probabilities. It could be possible to map sorting order to “probabilities” and even normalize probabilities of a list of alternative, though.

Thirdly, I’m not sure if it is possible to compare probabilities of sequences of different lengths, or whether I would except a solution just looking at the first token in the sequence.

Does anyone have any experience with this? Perhaps I have misunderstood how LLM’s work.

Also, I want the weight mechanism to be easy to control for predictable outcome, and I understand that may be difficult to do well.

I’m hoping for some input on this.

If it is relevant, here is the project URL:

Have you studied Decision Trees?

No, I can’t say I’m familiar with Decision Trees. What is their relevance to what I’m trying to do?

Your description of using Yes/No thresholds (and potentially having to adjust the threshold values) brought decision trees to mind.

I understand. The Yes|No example is an example of a grammar.

I think I just want to be able to adjust probabilities. I’ve done this in an earlier version of this project. What I did there was to convert logits to probabilities, multiply by a factor and then convert back to logits. It didn’t seem like it was the correct way to do it. It didn’t work as expected.

In the current implementation, of what I call eponec, I deal with logits, in the LogitsProcessor, in the correct way according to the documentation and the examples shipped with transformers. Therefore I left out that feature. Now I want to put it back in, but I want to do that correctly and not hackishly.

Sorry, I don’t have much to offer in this.