I am a technical architect for many years now but I am very new to Machine Learning. I have a requirement and want to have some help from the community.
I want to fine-tune a model on the JS stack. Currently, all big LLMs are general purpose and have all the things. But what I want is to have a custom model trained only for the Javascript stack and that too not all the JS ecosystem but the specific frameworks that I use day to day. The idea is to have a tiny model that can run on my Macbook and is strictly just about coding. Following are the things which I am looking for in it
-
Code completion
-
Explain the code
-
Document the code (optional)
-
Review code and suggest optimizations (based on best practices)
-
Finding issues and bug fixing
-
Ability to mention the framework and get answers strictly based on that for e.g. if I say @express then it should answer only based on the express js framework.
-
Ability to chat and ask coding questions.
-
Understand my code base and give answers on its basis.
I know I have mentioned a lot here but this is the roadmap that I wish to follow. I want it to be all open source without paying to any big guns. Of course, I will open-source this also.
I am seeking help from the community to guide me in the right direction.
-
Which base model should I pick (this mode is going to be strictly for coding, so I don’t want all the other stuff that is baked in those big LLMs)? It should be the lightest possible which can be fine-tuned and do the job.
-
Should I choose just a base model or instruct ready as I want to be able to chat with it also?
-
What is the best way to fine-tune the model?
-
What are the best resources that should be used to train the model?
-
Which datasets I should use to train the model as well as documentation for the best practices etc?
-
Currently I could find only one dataset available online for JS i.e. www.sri.inf.ethz.ch/js150. I don’t know the quality of it. If someone knows about it and can guide me if this can be used for training.
-
I have a Macbook Pro M2 with 32 GB RAM. Will it be sufficient or will I need to go for a paid solution? And if paid solution which one is the cheap and best.
-
Anything else that I should take care of?
Many thanks in advance and hoping for the support from the wonderful community.