Dears,
I needed to change this llama3.2 image reasoning example to ask question “what is the capital of USA”
below is the image example that I got from HG model card
i sloved it using this code
messages = [
{“role”: “user”, “content”: [
{“type”: “image”},
{“type”: “text”, “text”: "what is written in the image: "}
]}
]
input_text = processor.apply_chat_template(messages, add_generation_prompt=True)
inputs = processor(
image,
input_text,
add_special_tokens=False,
return_tensors=“pt”
).to(model.device)