Finetuning a customized dataset

I want to fine-tune a customized dataset instead of the dialogsum dataset being used in the lab. First of all, how can I create a buruli ulcer training dataset? If I create it too, should it be in the form of the dialog summary training data on huggingface?
Generative AI with Large Language Models Week 2

Its not easy to create a dataset actually, I mean you could search and find a relevant dataset online. Other than that use what options you got, I would not say use your imagination unless your project is only for you.

And yes if you are using using the same Lab structure and goal its needs be in the form of a dialog.

Thanks for the insight. Is there a way to get a Question and Answer (QnA) notebook instead of the dialog sum notebook used in the lab in case I want my project’s scope to be on QnA instead of summarization?

I dont know about that or where to find that!

I also don’t know where to find datasets like that. But the another thing you can do manually by hand is that collecting data through web scrapping if your requested site are allowed.

Please keep in mind you shouldn’t do any web scrapping if you are not sure about their privacy and policy. Maybe read thought their privacy section, do research about it, and ask the permission.