Langchain Chat with your data taught by harrison is a great lecture. However, I want to implement it on a Open Source LLM like falcon, llama or some other. Can anyone guide me on how I proceed.
My main goal is to create an Mobile Application with a chat feature which. I am thinking of implementing Finetuned LLM on the backend.
There a few ways you can do this
from langchain.llms import HuggingfaceHub
this will load the model and if you don’t have a great an A100 or V100 GPUs running inference will be a problem but they are ways you can tokenize and quantize it to load it to disk.
from langchain.llms.huggingface_pipeline import HuggingFacePipeline
This use an inference endpoint from Huggingface so you don’t need to load it to disk.
It’s not working, I am unable to integrate huggingface inference endpoint to the Langchain where I have processed documents
I think the inference end point you will have to pay for it. it helps you stay away from the mlops side of things. just like you have to pay for the OpenAI api. so I suggest you use the hub where you can download it to disk but this will not be possible if you have limited resource and the model you want to downlaod is a large one. check this link out for more information on how to go about loading a larger model. I hope this helps