ChatBot with LLama2

Can anyone help me how to modify the notebook to have a chatbot with Llamma 2. Basically how to change the get_completion_from_messages method? all the examples on the internet are just openai API in langchain but not from the other APIs like anthropic or how to use Llama 2

Hi @SaraAhmadi , welcome to the community!
As you know, Llama 2 is an open source LLM, and not a service like OpenAI GPTs. This means, there is no official API the notebook can point to run inferences on Llama 2. You should first host it somewhere, which is expensive and tedious.
Please mark this reply as the Solution if you think it is. Thanks!

1 Like

In addition to @carloshvp , I think there are many articles over internet which demonstrates how to run LLAMA 2 over Google Colab. Please refer to them, however this would be a slightly complex than direct API call to the model

Exactly @raj , I agree with you. In addition, Google Colab are ephemeral instances, which get automatically swtiched off after few minutes inactivity.

1 Like

Yupp, exactly. Great for experimentation.

I have a python code which calls the model (LLama2-chat) through Hugungface and I run the code on my own server which has powerful GPUs, Do you know if our data (when I pass it through the model) is going to be shared on Hugging Face. I am asking this question because data privacy matters for me and for this reason I want to use open source models

Hi @SaraAhmadi ,
If data confidentiality is a concern, I am afraid you will need to host the Llama2 model yourself or read and trust the contractual agreements you may have with Hugging Face as user (aka EULA). You can read the terms of service here: Terms of Service – Hugging Face

I want to host the model myself, but I don’t know how to do that. All the tutorials in the internet use it through huggingFace

Hi @SaraAhmadi ,
Self-hosting in your premises (e.g. on a PC) is usually not realistic, unless you have significant budget to buy the required hardware. But you can host it on public cloud providers like AWS, Azure or GCP. They all offer good services for simple deployment of popular models.

I have access to powerful GPUs but I dont know how to read the model locally. I mean which files should I download from huggingface and then how to read them to work with the model locally. That is my question. If anyone knows, I would appreciate to help

@carloshvp Do you know which files do I need to download from huggingface for open source models (meta-llama/Llama-2-7b-chat-hf at main) and How to read it on my local server?