Definisi data "banyak" dalam AI/Machine Learning

astriagustius · November 14, 2025, 2:25am

Halo semuanya, selama ini saya kebingungan untuk mendefinisikan yang dimaksud data “banyak” oleh orang-orang. adakah yang bisa mendefinisikan data dapat dikatakan “banyak” dalam machine learning itu seperti apa - seberapa banyak data yang harus kita miliki sehingga bisa disebut “banyak”?

apakah ada pengaruh jumlah kolom juga?

balaji.ambresh · November 14, 2025, 4:43am

The amount of data required depends on the problem at hand. For instance, to build a classifier for a simple decision boundary, just 10s of data points might be sufficient. On the other hand, to build an image captioning system or a language translation model from scratch, a lot more data points are required.

Please read this link on qualities of big data.

astriagustius · November 14, 2025, 6:38am

What if we want to build a regression model for forecasting or prediction?
Are there general guidelines for determining whether our data is considered “large”/”big data”?
This also affects whether we should use traditional machine learning or deep learning for modeling later on, right?

balaji.ambresh · November 14, 2025, 9:19am

Even for a regression problem, if the output is a simple affine transformation (i.e. y=mx+b) of the input, odds are good that a simple neural network with 10s of data points might be able to get it right.

There are quite a few guidelines on when / how to consider adding more data in training a model to achieve the desired performance. Please see deep learning specialization for more details. The specialization also talks about what to do when you have limited data and how to leverage existing models for your problem at hand.

Consider starting with traditional ML models for tabular data and neural networks for unstructured data like audio, text and videos.

TMosh · November 14, 2025, 5:09pm

Just one possible guideline, this is certainly not any sort of rule:

For simple regressions, the number of training examples must be much greater than the number of features.

I start with “much greater” being around 10x. That may (or may) not be sufficient to get good performance, it depends entirely on the complexity of the model.

astriagustius · November 17, 2025, 6:51am

All right, I’m starting to understand a little better. Thank you very much for your response. It’s very helpful @balaji.ambresh @TMosh

Topic		Replies	Views
Data Set Size for DL Structuring Machine Learning Projects coursera-platform	2	563	April 27, 2022
Any best practices on how much data is sufficient for ML/DL? Structuring Machine Learning Projects coursera-platform	2	545	December 1, 2022
Estimating the Quantity of Training Data AI Discussions	4	226	February 27, 2023
W1_Quiz_Large NN Models vs Traditional Learning Neural Networks and Deep Learning coursera-platform	3	727	February 5, 2023
How much data is sufficient for training? Structuring Machine Learning Projects week-module-3 , coursera-platform	2	87	September 7, 2025

Definisi data "banyak" dalam AI/Machine Learning

Related topics