When watching the video about why we need activation functions at all, I’ve come up with another question:
Is there any situation where a well-built deep learning model would achieve worse results compared to other models (linear regression, logistic regression, random forests, etc.). This is, as I’m learning these topics, I would like to know which tool would make the best results for each situation and why. But maybe, it might be that the best tool is a deep learning model with the proper activation functions, number of units per layer, etc. Is that true?
This is a question that is observed in practice one can not give you an answer of what could happen :).
There may be cases where the DL model might perform same or worse than other ML models. Other factors also are important such as use of computation resources, DL models take a lot of resources in comparison.
One case I have heard of are that XGBOOST ensemble’s have performance quite close if not better than DL models for tabular data. But of course it needs to be observed in rel life situations.
Thanks for your answer! The point is, do you know of any example of those situations where a DL model would achieve worse results compared to other ML models? I don’t think it’s impossible, but it would be nice to know what factors make this happen, so we know how to decide on future situations
Hey @AlvaroViudez,
In situations where you have smaller datasets, and when you don’t have pre-trained deep learning models related anyhow to such datasets, ML models could certainly outperform DL models. But once again, as Gent pointed it out, you need to try out both for a definite answer.
In my opinion, a good rule of thumb is if you have large non-tabular datasets, enough computational resources, and you don’t want to perform manual feature engineering, DL models can be the way to go. However, if the datasets are small, tabular in nature, you don’t have lots of computational resources and you are comfortable with manual feature engineering, ML models are a good choice.
Note that this is my personal opinion, and isn’t either theoretically or empirically verified. In fact, I wouldn’t even trust this all the times. Ultimately, you have try both to find a definite answer. I hope this helps.
To add on to the opinions of @gent.spah and @Elemento, there is a raging debate going on in the area of time series forecasting where the battle is between classical time series forecasting methods Vs ML/DL - Classical time series methods have been hard to beat…Of late, DL seems to be making up lost ground, though .
It must be added here that a hybrid approach is also showing promise.
Hey @shanup,
Can you please elaborate a bit about the “classical time-series forecasting” methods that you are referring to in your reply? I used to believe that classical methods include ML models only along with some concepts from Digital Signal Processing, but looks like I was wrong
You are right in a way. But lets understand that the classical time series methods gained prominence and existed before the term ML was coined or became famous. The classical methods could belong to the ETS family like the exponential smoothing, holt-winter method etc or the extensive family of ARIMA and its variants etc…Granted that these methods would involve a learning algorithm to arrive at the final model parameters - Still, the classical purists would like to be considered in a different category from an ML method like say, XGBOOST
The famous M4 time series forecasting competition was won using a hybrid method - Exponential smoothing + RNN. And from there on, DL seems to be gaining respect (or atleast getting considered as a worthy nemesis ) within the classical time series forecasting community.