Local LLM set up advice/Langchain (Guidance needed)

Hi,

I am quite new to AI - Having trouble working out the best way to set up and use local models and would like some pointers.

Context:
I want to be able to use/start to learn langchain/lecl/build some agents etc.

Previously, I had built a custom RAG solution. This connected to ollama models fine.
But as it does not support bind() - I am not able to use for langchain.

I run the query through AI - it suggests LM studio, previously I had issues with gguf models (and when I use the desktop app, removing the gguf flag on models shows zero results).

But, to be frank, the input I received did not sound very ‘certain’ more like it was guessing once the LM Studio set up started to get complex.

What is that standard/best practice tool set and set-up for storing/running local LLM’s.
I would like to get this locked down to the right tools and approaches as it is an essential area, that I am still not quite sure on.

If anyone has any advice, links to guides, etc, it would be appreciated.

I have a local set up with a 5090 (any advice on PyTorch and Tensorflow set up would also be appreciated) and a A5000 eGPU (which I plan to sell when I can get things working on 5090!.) 192gb ram etc. So running locally should be within reach.

Thanks in advance.

OK I am now going the vLLM route - but would appreciate some guidance if any one can offer.