How I can prepare my own dataset and fine tune model with it
Those are pretty big questions that are not easily answerable in a couple of well thought out paragraphs. How much background and experience do you have in Machine Learning? Have you considered taking any of the courses here as a way to learn how to do the things you are asking? If you are already a python programmer and have good knowledge of the fundamentals of Linear Algebra (operations on vectors and matrices) and want to learn ML, the best path would be to start with the Machine Learning Specialization and then follow that with the Deep Learning Specialization.
Note that Data Science is a separate topic altogether. In MLS and DLS, you’ll see lots of examples of datasets used for training and evaluating models, but they don’t really cover much about how to gather data and curate a dataset. There is also a specialization here called Practical Data Science. I have not personally taken that one yet, but it sounds worth a look for your questions above.
practical data science update
i am not sure if it is still available for new learners. it was available till last jan 7 2024, and certificate could be taken till July 2024.
regards
DP
@Ahmed112 personally, to start out, I think one big consideration is what kind/type of model are we talking about ? And do you mean strictly ‘fine-tune’ or, instead, transfer learning ?
Your data use case could be very different depending on the model and the type of data of interest.
I recently did it with crew ai… basically scraped documentation for a backtesting library to .txt, create agent workflow to format similar to the cleaned alpaca dataset… Then use the unsloth llama3 template to fine tune. Although the results are not consistent and currently its a file at a time, it works.