Ohk this is unsual discussion from what I usually ask here on the discourse, but lets take a shift
According to me
Machine learning is more prone to overfitting which can be fixed with regularization and human learning (which we do) is more prone to underfitting which is hard to be fixed with regularisation (thanks @Juan_Olano) as we cannot compete with computer in processing more information (idk why but lets assume it for now).
Again this claim could be wrong, as is is my “assumption”
Shouldn’t we solve underfitting with more data instead of regularization? if we review the lessons, isn’t regularization more a tool to solve overfitting than to solve underfitting?
Also, to understand better your point, I’d like to ask how do you see ‘regularization’ applied to humans learning in the context of your question? why not consider, for instance, having the human acquire more data? expand his mental models (similar to adding layers in the ML world)?
Not regularisation, that was my mistake but “we cannot compete with computer in processing more information (idk why, but lets assume it for now).” They are better than calculation as compared to us also at least I am not pro with numbers like Srinivasa Ramanujan sir
As far as consuming data, memorizing, calculating, providing speedy answers on logical questions, etc, computers have an advantage over humans. Humans still have advantage in other aspects, and I’ll bring 2 that I recently heard from Yann LeCun: Humans can plan and set objectives - machines are still unable to do this. So if the model requires planning and objectives setting, then the tables will turn.
I was discussing this recently with some friends and they argued that machines can plan and set objectives. I think that Mr LeCun was referring to some more abstract planning and objective-setting. May be machines can plan a budget, or a PERT model for a project, or estimate objectives based on this type of input, but planning ‘for life’ or ‘to change the circumstances’, things like that, which is what I think Mr LeCun was referring to, then ML cannot do it.
According to Mr LeCun again, that would be the next inflexion point. He mentioned that this may still be a few years ahead, if it ever is to happen. But if this happens, then we could be in front of ML that is closer to sentient.
Thank you for your note. I’d like to share 2 thoughts regarding your comment:
More data is one of the ways to solve underfitting. If you train your model for a longer duration (meaning, more data) you can help with underfitting. By increasing the amount of data, the model has more examples to learn from, allowing it to better capture the underlying patterns in the data and improve its performance.
In my first comment, 2nd paragraph, I also say 'expand mental models (similar to adding layers in the ML world), which would cover your proposed ‘more complex model’ which, as you very well say, is another way to solve underfitting.
Do you have a source for this or an example where you experienced this effect?
Would be quite curious!
All in all - great question: thanks @tbhaxor for bringing this up. I believe there is some truth to it, but mainly due to psychology or game-theoretic reasons I guess:
ML tends to overfit because the AI engineer has often (at least implicitly) incentives to report very good (or even best in class) results, sometimes even caused by external pressure from stakeholders.
(over)simplifying things might be one characteristic that human beings need to do sometimes to deal with such a complex world, which contributes towards underfitting.
I guess I can search for sources… I think I remember at some point in MLS or DLS Prof Ng mentions that a very simple dataset may cause underfitting. If you add more features (produce a more complex dataset) and you train for longer, you can help the model find patterns.
Here I completely agree, @Juan_Olano: adding features based on the same number of labels resp. increasing the model complexity, e.g. by adding new features / dimensions, will clearly help to reduce underfitting. However, I would argue: this is rather due to the higher complexity of the model.
if the model complexity is limiting, adding data would not help too much to reduce underfitting, at least not in this example - let’s assume:
a linear regression model
ground truth labels follows a sin(t) behaviour for two full periods
assuming you have 100 labels for this
Increasing the 100 labels to 1000 labels would not help. The limited capacity of the 1D linear model parameters (only bias and weight as parameters) is the limiting factor. Chose a nonlinear domain model or AI /data-driven model such a Gaussian Process with the right kernel and it will work way better also with <<100 data points only to tackle underfitting.
With more data, at least we know that linear technique or the current technique will not work we need more complex model. So without having more data, I dont how we learn from it that “hey it seems curved data, lets have higher order polynomials” For examples lets say we have 2 points, now we can draw a line with two lines but exact picture comes when we have another points
So according to me both more data and complex model is required to fix underfitting
What I want to point out, that I see a definition issue at the core in this thread:
with more data, @Juan_Olano highlighted rather more features (which also corresponds with higher model complexity not only because of more model parameters due to higher dimensions but also since now “non-linearity” is described in a well-crafted (= modelled) feature. For sure he is absolutely right: this will help to tackle underfitting.)
with more data, often is meant: more labels
True (besides that this is too little data [number of labels] to entertain a reasonable train, dev, test split. Anyway let’s assume you take this for fitting a model)- however based on the available data I am not sure if this is classic underfitting:
More abstract: in this extreme example you have way too less data [number of labels] to describe the business problem - (but not necessarily to fit a super simple model). In this scenario: I would strongly recommend to take a look at Active Learming that can help to find high quality labels.
We are all in the same page: More data never harms!
The question is: where does it help when it comes to underfitting as written above.
I think that it will depend. The initial post is, in general, Humans vs Machine. There are no specifics as to what problem we are facing.
I think that we humans, in general, don’t need a lot of samples of a given topic, but rather a lot of ‘features’. We humans don’t need to learn 10,000 samples to predict house pricing. We usually gather more ‘features’ with much less samples and are capable of providing a good estimate. And many of these features are not even ‘conscious’ or hard data.
On the other hand machines do need more data. That is, plenty of samples with the right amount of quality features.
And again: this comment is on the light of the title of this post. We can certainly argue that scientist in a lab will need lots of samples to solve a problem
haha - I love this comment - it makes a lot of sense to me! Humans would be closer to general AI, so we use less data on any specific problem, and we tend to our problem space into 1-7 criteria / features to help us decide. If it’s a deeper problem, we might put down our thoughts on paper and get to 7, but if it’s a simple problem or we need to decide on the fly, maybe 1-3. So underfitting is natural in this case.
ML models are typically trained to answer a specific problem, but in my experiences of business problems, the data often misses contextual information that has not been converted into data, and there are too many ways to fit the model based on data available, hence the overfits.
I first use Business Cases with AI and Machine Learning.
Specifically, I use AI with Machine Learning on 12 of my RC (remote control) DJI drones. The 12 RC drones monitor and forecast the algae growth in the 6-mile lake. Also, the drones developed their own decision logic code to determine if (1) they can fly or not based on current conditions, (2) when they need to recharge, (3) when they require any parts replacement and (4) how to best gather the video data to counteract the wind or weather conditions.
Our RC drones download the videos to our big data cloud on a daily basis and creates a forecast analysis of where the algae will be in the next 12-24 hours. This helps our team (and other stakeholders) understand where to collect the algae before algae conditions get worst.
@scabalqu question for you: when you say that the drones have learned, does it mean that you have embedded the ML models in the drones? or this is done in the cloud and from there there is a process that manages the drones?
What model of DJI drones are you using?
The videos are processed using YOLO? if not, what model are they using?
Each drone covers a section of the lake? or they overlap?