Need Help Fine-Tuning a Mamba Model with using Hugging Face Transformers

Hey community!

I’m working on fine-tuning the Mamba model (specifically state-spaces/mamba-2.8b-hf ) for a multi-turn dialogue system , but I’m hitting some roadblocks. My goal is to build a chatbot that retains context across conversations, like:

Input > Dialogue1: Hi! Can you recommend a pizza place?
Dialogue2: Sure! Are you looking for vegan options?
Dialogue3: Yes, preferably near downtown.

Output > [Bot]: [Expected Response]

My Setup:

  • Using Hugging Face Transformers and PEFT for LoRA.
  • Training on custom conversational data.

Specific Questions:

  1. Data Formatting :
    How should I structure multi-turn dialogues? I’m using <|endoftext|> as a separator(eos token for state-spaces/mamba-2.8b-hf), but the model ignores past turns.
    Should I prepend [User] /[Bot] labels or use special tokens?
  2. LoRA Targets :
    Which Mamba layers should I adapt? Currently targeting x_proj , in_proj , and out_proj .
    Is r=8 sufficient for conversational tasks?

Code Snippet (Training Args):

pythontraining_args = TrainingArguments(
per_device_train_batch_size=2,
gradient_accumulation_steps=4,
learning_rate=3e-5,
fp16=True,
)

I am having hard time writing the code for mamba 2.8b, to fine-tune it. Either it doesn’t work or it doesn’t fine-tune properly.

Any tips on architecture tweaks, data prep, evaluation strategies or any code suggestions/documentations ?