Need Help Fine-Tuning a Mamba Model with using Hugging Face Transformers

NILESH_RANJAN_PAL · March 14, 2025, 9:54pm

Hey community!

I’m working on fine-tuning the Mamba model (specifically state-spaces/mamba-2.8b-hf ) for a multi-turn dialogue system , but I’m hitting some roadblocks. My goal is to build a chatbot that retains context across conversations, like:

Input > Dialogue1: Hi! Can you recommend a pizza place?
Dialogue2: Sure! Are you looking for vegan options?
Dialogue3: Yes, preferably near downtown.

Output > [Bot]: [Expected Response]

My Setup:

Using Hugging Face Transformers and PEFT for LoRA.
Training on custom conversational data.

Specific Questions:

Data Formatting :
How should I structure multi-turn dialogues? I’m using <|endoftext|> as a separator(eos token for state-spaces/mamba-2.8b-hf), but the model ignores past turns.
Should I prepend [User] /[Bot] labels or use special tokens?
LoRA Targets :
Which Mamba layers should I adapt? Currently targeting x_proj , in_proj , and out_proj .
Is r=8 sufficient for conversational tasks?

Code Snippet (Training Args):

pythontraining_args = TrainingArguments(
per_device_train_batch_size=2,
gradient_accumulation_steps=4,
learning_rate=3e-5,
fp16=True,
)

I am having hard time writing the code for mamba 2.8b, to fine-tune it. Either it doesn’t work or it doesn’t fine-tune properly.

Any tips on architecture tweaks, data prep, evaluation strategies or any code suggestions/documentations ?

Topic		Replies	Views
Week3 - I have just completed the course, excited to put my knowledge into practice! Generative AI with Large Language Models week-1	2	42	October 15, 2024
I want help with my Chat Bot which is made using hugging face Llama-3 AI Discussions ai-discussions , project	2	64	July 29, 2024
Llama 3.2 finetuning and evaluations? Introducing Multimodal Llama 3.2	6	102	October 18, 2024
Help in model training strategies (PEFT/LORA + RAG) AI Discussions ai-discussions , project	0	28	November 2, 2024
Fine-tuning using PEFT/LORA for GPT-3 - how? Generative AI with Large Language Models week-2	3	832	July 13, 2023

Need Help Fine-Tuning a Mamba Model with using Hugging Face Transformers

Related topics