A Valentine's post: The role of symbolic logic in determining positional encodings for Transformers

Hey all,

So this constitutes an idea I’ve kind of held ‘close to my heart’ for awhile now, yet without help realized I have little chance of scaling without assistance.

The thinking process goes that though many now use ‘agents’ to scale competition in LLMs there might be a more reasonable (or sensible) way, if only we had the right database (and this is not so crazy-- much of LLMs post-training consists of a large volume of RLHF, but maybe we are only chasing the tail end of the horse that has left the gate).

As first, a long time lover of literature, and first a student of Philosophy, I realize many in this field, are, well, perhaps not. Language belies a certain sensitivity to it.

In any case, I’ve become amazed at how much a merely ‘required’ Philosophy course (in symbolic logic) has altered my life. I recall, as a student, thinking how am I ever going to use this ? I was suggested at the time, well, it could be useful if I ever decided to become a lawyer… (And I had no desire to become a lawyer).

But, later on when I started to study/learn FPGA’s-- Ah ha ! There they are, your ‘truth tables’, your Boolean representation of a function, and I thought ‘lo’ho’, I have a leg up on this and it made me feel good.

I also did try to reach out to my old professor on this very topic in present times. He got back to me once which was very nice, but I have not heard back since.

His entire textbook is available free online though, if in an antiquated format:

https://courses.umass.edu/phil110-gmh/MAIN/IHome-5.htm

He is an excellent teacher though so I want no negative feedback towards him.

My thought was, seeing as sentences can be defined ‘situationally’ this is not the same as ‘symbolically’, and for what I will call ‘active attention’ this is what we would require.

A bit less of the ‘word salad’.

But composing such a database would take a (really) dedicated team and some time.

Yes, we’ve found you can scour the back ends of the internet and comprise something useful, and I confess I still have a bit of a ‘bias’-- Even your ‘best model’ is still always, always, always dependent on the data you put into it [there is, simply, never, no way around that], we ought to think more carefully about curation beyond FineWeb, etc.

Though, even considering present models I think you could build the symbolic logic into the positional encodings; Or at the start, not the end… If only you had the dataset.

And, I guess, I mean, I suppose I am still not really convinced this is the most efficient way to implement ‘all things’ as we are acting with a database methodology in Q,V,K… But at least it might help.

I did mention an earlier version of this thought on Discord’s @akakak1337 forum, but it got no traction, so perhaps I am wrong.

Felt though it was finally time I take it off my chest.