Preparing my own dataset

Ahmed112 · May 30, 2024, 1:26am

How I can prepare my own dataset and fine tune model with it

paulinpaloalto · May 30, 2024, 2:09pm

Those are pretty big questions that are not easily answerable in a couple of well thought out paragraphs. How much background and experience do you have in Machine Learning? Have you considered taking any of the courses here as a way to learn how to do the things you are asking? If you are already a python programmer and have good knowledge of the fundamentals of Linear Algebra (operations on vectors and matrices) and want to learn ML, the best path would be to start with the Machine Learning Specialization and then follow that with the Deep Learning Specialization.

Note that Data Science is a separate topic altogether. In MLS and DLS, you’ll see lots of examples of datasets used for training and evaluating models, but they don’t really cover much about how to gather data and curate a dataset. There is also a specialization here called Practical Data Science. I have not personally taken that one yet, but it sounds worth a look for your questions above.

Deepti_Prasad · May 30, 2024, 3:15pm

Hi @paulinpaloalto

practical data science update

i am not sure if it is still available for new learners. it was available till last jan 7 2024, and certificate could be taken till July 2024.

regards
DP

Nevermnd · May 30, 2024, 6:10pm

@Ahmed112 personally, to start out, I think one big consideration is what kind/type of model are we talking about ? And do you mean strictly ‘fine-tune’ or, instead, transfer learning ?

Your data use case could be very different depending on the model and the type of data of interest.

davidbzyk · June 6, 2024, 1:31pm

I recently did it with crew ai… basically scraped documentation for a backtesting library to .txt, create agent workflow to format similar to the cleaned alpaca dataset… Then use the unsloth llama3 template to fine tune. Although the results are not consistent and currently its a file at a time, it works.

Topic		Replies	Views
Creating my dataset AI Discussions	1	89	June 15, 2024
Guidance regarding Machine Learning Course AI Discussions	4	178	January 21, 2024
AI Learner ( Begginer ) - Guidance towards learning for the AI world AI Discussions ai-discussions , introductions	6	234	July 1, 2024
Practical Data Science Specialization Update FAQ faq	1	476	May 10, 2024
Very Confused M4ML Resources	7	329	November 10, 2023

Preparing my own dataset

Related topics