Week2 doubt- trying out model on subset of data

mukul1997 · June 9, 2021, 5:49pm

In the video “Tips for getting started”, Andrew talks about trying out code on a small subset of data before training with a whole training set. When we talk about some complex algorithms, we know that they perform better when they are given more and more data.
So, how if we give a small subset of data to algo, we will know that it is performing good or bad?

fabioantonini · June 10, 2021, 12:21am

Hi @mukul1997
Welcome to our community.

I guess that what Dr. Andre Ng suggest is just a sanity check of the model before spending hours to train it on a large dataset.
He does the example of the speech recognition system. He tried to overfit just one audio clip on the training set and realized that his system returned ‘space, space, space, space, space, space’. Clearly it wasn’t working. There wasn’t much point to spending hours and hours training it on a giant training set if it doesn’t work on a small dataset.
So the tip Andrew Ng recommends is to give the model a try just on a small dataset to understand if it works and avoid to spend many hours to train it on a giant dataset. He is not talking about the model performance.
Hope this can help
Regards

Topic		Replies	Views
Sanity check before scaling up Machine Learning in Production	1	554	June 23, 2021
Should a training dataset be big? Supervised ML: Regression and Classification week-1	2	497	July 24, 2022
How to Choose a Subset for Initial Model Training? AI Discussions ai-discussions	4	24	October 20, 2024
Can adding data hurt? Machine Learning in Production week-2	2	43	September 16, 2024
Question: week 1, steps of an ML project -2.30 min Machine Learning in Production	2	589	May 17, 2021

Week2 doubt- trying out model on subset of data

Related topics