When you create a ml translation or ml based chatbot do you need a large dataset?

If so where do you find that?

Hi @yujin_yano

It depends on the model.

  • If you train the model from scratch, you do need a large dataset.
  • If you fine-tune a model (already pre-trained on the task), then smaller dataset might be enough.

One of the places you can find them is hugging face datasets, in particular:

Depending on languages you want to translate, you could find more datasets online (for example, Europarl Parallel Corpus).

Cheers

2 Likes

I see. Thank you for your answer. it’s helpful.
I look up the datasets.