Context Window (memory)

Awad_A_Younis_Mussa · June 28, 2023, 9:44pm

Hi,
Can an LLM relate different Context Windows (memories) in one session by default or do we need to do something about it? From my experience working with GPT 3, there is inconsistency, sometimes it remembers and at other times it does not. I noticed when I say “based on the information given previously” do this task and as I said I noticed an inconsistency. Thank you in advance.

asked · June 29, 2023, 4:24am

Chatbots such as GPT3 have predefined context windows. The inconsistency may arise in the context of your prompt.

For example, you may want it to perform a task based on your prompt and additional text, but the task may be misinterpreted when it’s reading a different part of the text.

A solution, or good practice, is to separate the two out. Example: “Write a summary of the text within the triple backticks: {the text}”. This way you have separated out the “action” from the “text” it needs to perform the action on.

Hopefully, this helps you with the inconsistency issue.

CT2020 · June 29, 2023, 5:37am

how many token/word in history when you see inconsistency? the context maximum token is 16k now, before is only 4k

Awad_A_Younis_Mussa · June 29, 2023, 4:09pm

Thank you very much for your response.
I have 6,991 tokens and 31405 characters (31405b/1024b around 31.405 kb, assuming each character is encoded/stored as one byte).
Assuming 31.405 kb is calculated correctly, it is way more than 16KB. and
Just to give you a context, the task I am working on requires mapping a description of something (say X) to 31 types, and based on the description the LLM is expected to map this X to one out of the 31 types. Every type has an average of 160 to 320 tokens per type (1.3 to 1.529kb per type).

Awad_A_Younis_Mussa · June 29, 2023, 4:10pm

Thanks a lot. I will try it out.

Psychopixel · June 30, 2023, 8:25am

i don’t think 1 token is 1 character. if you are using the api there is a python library called tiktoken that let you count the token for a text. This is a simple python script to count token

import tiktoken

def num_tokens_from_string(string: str, encoding_name: str) -> int:
    """Returns the number of tokens in a text string."""
    encoding = tiktoken.get_encoding(encoding_name)
    num_tokens = len(encoding.encode(string))
    return num_tokens

text = "This is a simple test. We're testing tokenization with tiktoken. Tokenization is fun!"
# gpt3 should use this encoder, not sure about gpt4
modelName = "gpt2"
list_encoding = tiktoken.list_encoding_names()
if modelName in list_encoding:
    print(num_tokens_from_string(text, modelName))
else:
    print ("Model name not valid!")

Awad_A_Younis_Mussa · June 30, 2023, 2:53pm

Thank you very much.
I used “OpenAI Platform” to count the number of tokens and it produces both the token and the number of characters. As the measure was “16k” and the computer memory “x86 architecture” stores data in terms of bytes, I calculated kb=number of characters/1024 (kilobytes).

Topic		Replies	Views
ChatGPT - token size exceeds ChatGPT Prompt Engineering for Developers	3	206	May 10, 2023
Handling large number of tokens ChatGPT Prompt Engineering for Developers	4	196	April 29, 2023
Storing user behaviour, info, chat history and previous dialogues for a chatbot AI Discussions	0	39	July 3, 2023
How to develop a chatbot with knowledge of a specific paper ChatGPT Prompt Engineering for Developers	2	90	May 21, 2023
Langhcain for LLM application development. Memory LangChain for LLM Application Development	4	222	November 15, 2023

Context Window (memory)

Related topics