How can ML models become smarter? Can someone point me towards real life examples about how to do this via Python? For example, I used to be a product manager at Facebook, and my teamsmade heavy use of ML models to develop predictions of user behavior such as predicting CTRs for notifications. And the models got smarter as they got more data about users.
But when I code ML models myself, what I’m finding is that the only way to make the model “smarter” is to retrain it by doing a new fit() with more data. Is this how ML models become smarter? Thanks!
There is another training method (stochastic gradient descent), where training happens continually, incrementally as each new observation is obtained.
It has the advantage of continually being updated, but it has the disadvantage of being noisy, time-lagged, and never really converging on the optimum solution.
Hi, @David_Park !
That’s right! Having more data is always convenient because the model can learn from a better representation of the reality you are trying to model.
In addition, if by getting smarter you mean getting better performance metrics, you can always try to tweak your model, adjust some hyperparameters (batch size, learning rate, …), provide data augmentation techniques, etc.
Feature engineering can help with that as well. Yes, some models can engineer their own features to some extent, but if you can help them out by using domain knowledge to point them in the right direction, then they’ve got an easier job to do.
Example: for another course I had a project about predicting how much a taxi ride would cost. Two of the features were pickup and drop-off coordinates. Neither of them affects the fare much on its own. As the model designer you know that the distance between the two points is a decent estimate of the mileage on the meter - if you calculate that distance and make it a feature, the correlation with the fare is strong enough for the model to learn it easily. If all the model had was the raw coordinates it might not independently come up with such an effective way to combine them. Or if it did, that would take up training time & parameters that could’ve been spent on discovering patterns in the data that you didn’t already know about.
Thanks @TMosh , @alvaroramajo , and @Andrei_Landon for the help!
Are there any other models like stochastic gradient descent for which calls to predict incrementally add to the models?
There are several other algorithms and techniques that support incremental learning or online learning, where the model is updated as new data becomes available like Perceptron Algorithm(a simple online learning algorithm for binary classification), Winnow Algorithm( Used for binary classification, it updates model weights incrementally based on misclassifications), Vowpal Wabbit( It can handle incremental updates and is widely used in both classification and regression tasks), Streaming k-Means(streaming k-means are used for clustering, and they can adapt to new data points without reprocessing the entire dataset), Bayesian models can be updated incrementally using new data, making them suitable for continuous learning, Temporal Difference (TD) Learning.
I would say, it becomes fitter (to the data) instead.
You could think that:
- the act of training is to make the model fit into the (training) data space so that it could make prediction base on an input (that related to the training data)
- when adding new data, the model simply adapts to that new data.
So, one conclusion we could make is that, in your example, when adding new data about user behaviour,
And IF the model is retrained on all data about that user, could actually make the model less accurate to predict the actual user behaviour.
That’s because the behaviour changed, user may no longer follow their past pattern.
As others have said:
Better data (quality, features, normalization, etc)
Increased training time (eg perceptron / neural nets)
I’m reluctant to apply anthropomorphic terms like ‘smarter’. I think it is more productive to be explicit about specific measures of improvement instead. Is it fewer operational false negatives or false positives? More generalizable eg works in both rising and falling markets, or in rain or sunny conditions? Makes more efficient use of limited development or runtime resources? More explainable? Fewer hallucinations? Once defined as an objective (the ends), each measure suggests a particular way forward (the means) such as those described in the many helpful replies above.
Certainly! Improving the intelligence of Machine Learning (ML) models involves a combination of techniques beyond simply retraining with more data. While continuous training is crucial, refining model architecture, optimizing hyperparameters, and implementing advanced algorithms contribute significantly to enhancing intelligence. In Python, libraries like TensorFlow and scikit-learn offer tools for model fine-tuning. Additionally, leveraging transfer learning from pre-trained models and employing ensemble methods can boost performance.
I attended to a very good course on this subject here.
Though quickly, it provides an overview on training and fine-tuning techniques, what I think is what you’re interested in.