On Windows I can’t get good (any) Langchain CTransformers GPU performance (which is why I tried ctransformers). Either the model loads into GPU (I see it in task manager) or it doesn’t (can’t remember at the moment) but the GPU doesn’t spike at all. And the speed is slow so I know it’s CPU only. I’ve followed all the instructions I can find but whereas I can get GPU working on Llangchain GPT4All and also on ctransformers, I can’t on Llangchain CTransformers.
Finally managed to build llama-cpp-python with GPU support on Windows (absolute nightmare, similar problems reported by many people, different variations of suggestions, some work for some, other don’t, my own solution was never mentioned which is why it took me longer and I was exploring other python GPU options).
Langchain llama.cpp fully loads the model into GPU and executes it there. Fastest of the options I tried so far (but to be fair Langchain GPT4All not far behind) but most importantly doesn’t touch the CPU.
So I don’t know why CTransformers uses the CPU as well as the GPU. Well better said why ctransformers uses the CPU and GPU because I believe CTransformers use ctransformers for core functionality.
I can’t recommend Langchain CTransformers for GPUs over Langchain GPT4All or even better Langchain llama.cpp . Or you can use their native non-Langchain versions.