Chat with LLM , FALCON model for local API

FALCON 40B is not available or of big size, you can go with the below FALCON API for local execution ,

the responses are good for local testing.
Let me know , if you guys find any other FALCON Free Inference API for testing.

I have pointed HF_API_FALCOM_BASE to, but when I run
prompt = “Has math been invented or discovered?”
client.generate(prompt, max_new_tokens=256).generated_text
I’m getting an error - BadRequestError: Authorization header is invalid, use ‘Bearer API_TOKEN’

Any idea on how to fix it?

I got it working.
In previous line while initializing the client I made a change -
client = Client(os.environ[‘HF_API_FALCOM_BASE’], headers={“Authorization”: f"Bearer {hf_api_key}"}, timeout=120)


1 Like

How did you know that the HF_API_FALCOM_BASE should point to for the falcon 7b-sintruct model?

I’ve tried to download the 7-b model but it does take a bit of time.

@Raja_Sekhar How did you find the API end-point for the Falcon 7b-instruct model?

1 Like

go to and search for Falcon 7b instruct and you will get the model → click on deployment → Inference API and you will see the link.

1 Like