Langchain Ctransformer GPU performance

Finally managed to build llama-cpp-python with GPU support on Windows (absolute nightmare, similar problems reported by many people, different variations of suggestions, some work for some, other don’t, my own solution was never mentioned which is why it took me longer and I was exploring other python GPU options).

Langchain llama.cpp fully loads the model into GPU and executes it there. Fastest of the options I tried so far (but to be fair Langchain GPT4All not far behind) but most importantly doesn’t touch the CPU.

So I don’t know why CTransformers uses the CPU as well as the GPU. Well better said why ctransformers uses the CPU and GPU because I believe CTransformers use ctransformers for core functionality.

I can’t recommend Langchain CTransformers for GPUs over Langchain GPT4All or even better Langchain llama.cpp . Or you can use their native non-Langchain versions.