Sentence_window_engine = get_sentence_window_query_engine(sentence_index) --ValueError: Couldn't instantiate the backend tokenizer

  1. I am running the code from my notebook. I have 16 GB RAM and a GPU NVIDIA Quadro P1000 with 4GB VRAM.

I run Python 3.11

When I try to run the line:

sentence_window_engine = get_sentence_window_query_engine(sentence_index)

I get the error message:

ValueError: Couldn't instantiate the backend tokenizer from one of: 
(1) a `tokenizers` library serialization file, 
(2) a slow tokenizer instance to convert or 
(3) an equivalent slow tokenizer class to instantiate and convert. 
You need to have sentencepiece installed to convert a slow tokenizer to a fast one.

I have installed sentencepiece.

  1. Furthermore I wonder if I can run the code in my Notebook given its specs. Whether there is an alternative with smaller model, e.g. a quantized version

  2. When I try to run the Jupyter Notebook in Google Colab I get the persistent error message:

I run the code:

import openai

#os.environ[‘OPENAI_API_KEY’] = ‘…’
openai.api_key = ‘…’

(The openai_api_key I import is correct)

I get the error message:

OpenAIError: The api_key client option must be set either by passing api_key to

the client or by setting the OPENAI_API_KEY environment variable

Alexander, Answer to your second questions : I could run this with LLMA.CPP module (GitHub - abetlen/llama-cpp-python: Python bindings for llama.cpp) just with CPU. CPU + GPU (with 6GB) is slower for me . With CPU itself, I could get 6 tokens / secs. With one of my mac (m1 pro) , I could go as high as 20 tokens / sec. My target is to run the llama2 13b parameters model. Code snippet below:
#LLM Based on 13b LLama-2

from langchain.llms import LlamaCpp

llm = LlamaCpp(
model_path=“d:/srccode/data/ML/ggml-model-q4_0.gguf”,
temperature=0.0,
n_threads=8,
top_p=1,
n_ctx=3900,
verbose=True
)