Seeking Suggestions: Storytelling Transformer Architecture Optimization

Lee_AI · March 27, 2025, 5:19am

Hello everyone!

I’m a learner and want to try to train a storytelling AI (Transformer, decoder only). The goal is to balance size, efficiency, and performance while keeping it trainable on Colab Free (T4, 16GB VRAM) within a month.

Current Architecture:
- Layers: 10
- Embedding Size: 512
- Heads: 8 (64 dim/head)
- FFN Size: 2,048
- Vocab Size: 16,000
- Context Window: 1,024
- Attention: MHLA (multi head latent attention)
- Params: ~100-125M
- Memory: ~1-1.2 GB (FP16) with checkpointing
Training:
- Dataset: Not decided yet, around 20GB…
- Batch Size: 16
- Time: unknow.
- Epochs: 7-8
- Schedule: around a month
Overall:
Size ~ 1GB

If you have any suggestions on the Architecture, Training, Platform, or any aspect, please share with me, I am open to any idea or tweaks.

Thanks for your time! Very appreciate!

Topic		Replies	Views
Newbie Seeking Advice on AI Training Dataset Collection AI Discussions ai-discussions	0	23	March 28, 2025
The optional AI storytelling link is not working😔 NLP with Attention Models week-3	1	454	May 15, 2023
Glad to be here! Generative AI with Large Language Models week-1	1	421	July 25, 2023
Couple Questions From Week 1 Generative AI with Large Language Models week-1	1	381	October 2, 2023
Transformer architecture is smarter than you think AI Discussions the-batch , ai-discussions	1	56	May 20, 2023

Seeking Suggestions: Storytelling Transformer Architecture Optimization

Related topics