Isn't anything but neural networks obsolete?

Dear community,

after studying the deep learning part of this course, I really ask myself, what exactly could be the motivation to use any other model type then neural networks (NNs) for either regression or classification tasks.

  • It seems that NNs are far superior to avoid bias (“low bias machines”).
  • Feature selection does not seem to be so much of an issue, as NNs do this for me “automatically”.
  • As long as I have enough data, NNs can avoid high variance.
  • Configuration of hyperparameters seems to be quite easy.
  • CV error and test error seem to outperform other models.

So, are there other reasons than “speed” to not choose NNs for any given machine learning task?

Best regards,
Matthias

2 Likes

Hey @Matthias_Kleine,
Welcome to the community. First of all, let me highlight some things in your aforementioned points.

Neural networks can also avoid high variance with small datasets, by using “Transfer Learning”. I don’t recall it being discussed in the course, but just for your information, I thought to mention it. You can read more about it here.

I don’t think I would say so. The networks that have been discussed in the course are fairly small in size, and discuss only a small number of different NN architectures (I believe the standard NN and intro to CNN), and hence, only a few hyper-parameter choices, which may make it seem like an easy task, but in reality, when we are trying a large (and perhaps new) NN architecture on a large dataset, the hyper-parameter choices can be huge, and in that case, it becomes a really difficult task.

That’s it for your points. Now, let’s see why one may consider a traditional ML model rather than a NN:

  • The first reason is training periods. As the NNs scale up, their training period also increases, and for many tasks, such training periods can’t be accepted, due to reasons such as; excessive costs of training on cloud; the models have to be trained periodically to avoid data drift, etc. In such scenarios, a traditional ML model might be a more suitable choice.
  • The second reason is as you mentioned inference time. Many traditional ML models offer faster inference than NNs, at the cost of some accuracy.
  • The third reason is storage requirements. As the size of the NNs grow, so does the storage required to store a NN for inference purposes. Now, this is a clear disadvantage if we want on use a model on edge devices, without using the cloud, due to reasons such as connectivity, privacy, etc.

I guess a clear trend that can be seen is that when the data involved is unstructured (such as images, videos, text, geospatial, etc), in that case use of NNs is much more common, and they have done wonders, indeed. But when the data is structured (primarily tabular), in that case use of traditional ML models can be clearly observed. And that’s perhaps one of the major reasons, why a typical Data Scientist’s routine involves much more usage of traditional ML models, than deep learning based models, since a majority of them deal with tabular data.

But at the same time, I would also like to state that active research has been going on to improve NNs in all of the aforementioned aspects, and there has been some amazing research in the last decade, that improves the NNs to a great extent in all of these aspects.

I hope this helps.

Cheers,
Elemento

7 Likes

Hi there,

deep neural networks are great, especially for high dimensional and rather unstructured data (like pictures, video frames, …). Also benefits like transfer learning can be really great!

But often interpretability and explainability are a crucial business requirement. You might not only quantify statistically how certain the model is, maybe you also want to know how and why the model came to the conclusion.

So if you already have very good domain knowledge and can I corporate this into your features, resulting in a smaller feature space (e.g 5 dimensions or so) maybe another model can meet your requirements even better than a NN w/ less data (which is usually much more cost-efficient, considering your labelling strategy).

Let’s take Gaussian processes (GPs) for example.
There are a few reasons why one might choose to use a (GP) over a deep neural network (DNN) like interpretability or explainability reasons as well as exploiting the opportunity to incorporate prior knowledge into the model which can provide uncertainty estimates in addition.

Note: as Andrew pointed out, the model itself might not represent your biggest leverage. Often improving the data quality or let’s say data-centric AI can be the stronger lever to excel in your application considering business requirements, see also: A Chat with Andrew on MLOps: From Model-centric to Data-centric AI - YouTube

Best regards
Christian

2 Likes

NN’s are a great tool for certain classes of problem. But they are not great for everything.

For one, they cannot handle predictions where sequences of events are important. The NN’s taught in this course are batch processors, so the order of the examples doesn’t matter.

One method that handles time sequences is a Recurrent Neural Network.

It’s covered in a fair amount of depth in the Deep Learning Specialization.

1 Like

Hi @Elemento, @Christian_Simonis and @TMosh ,

thanks for your detailed answers. So I understood that there are quite a few reasons why to choose a classical ML model instead of a NN. However, in most of the cases the actual performance of the model (in terms of avoiding prediction errors or accuracy) does not seem the main reason for choosing a different model.

I therefore ask myself if in practice NNs may be used to sort of “setting a benchmark” for the model of choice. For example, in the MLS course Andrew Ng is talking about comparing the model performance (i.e. the cost function) to some baseline, which might come from a human performance or from a competing model.

Let’s say that I already know that a NN will be not a good choice, because I need performant inference on edge devices. I could then nevertheless train a NN to set an accuracy benchmark. So I know already that the data is good enough to reach that benchmark and might say that my classical ML model should at least come close to that benchmark.

Is that an aproach that you have seen in practice? I am asking that myself because I sometimes might not know if the data are even good enough to reach a certain desired accuracy, but when I know that a NN can do it I won’t have that excuse …

Best regards
Matthias

1 Like

Hi there,

if I understand your question correctly, you are asking if having a benchmark makes sense, especially considering computational limits of your embedded or edge device in the target deployment scenario.

Yes, there are cases where this makes sense! Especially then, if you need to make architectural decisions (e.g. are there functions to be executed on cloud or edge, when will new training be triggered, …etc.) also if you have the right data ready, this speaks in favour of doing so. Also if you have a potential upside of improving your business problem, e.g by higher accuracy or smaller uncertainty, this might be a reason. After all, you can assess then what seems to be technically possible and reasonable.

I am a big fan of having fair benchmarks. This can include in particular for time series prediction:

  • a naive prediction (like a constant forecast or a linear model) as well as
  • AutoML.

That being said, dependent on your problem, you want to solve, you can chose other appropriate benchmarks: these may even be domain models (e.g. Open CV models for computer vision) or e.g. also Deeplearning models as you suggested. Please bear in mind to consider efforts and costs, especially if data quantity is limiting.

Do you have an application or example in mind where you would use a DL model as benchmark?

Best
Christian

Let me describe the following example:

Given a set of measures that have been taken to fight the C virus, generate a model that predicts the “success” of these measures.

So I might collect data from several sources, like the OWID data set, Oxford Stringency Index, data about vaccinations, population “compliance with measures” data, weather data, population densitity data asf.

I might then decide if I create a regression model (for example predicting “c deaths per million” or “excess mortality in percent”) or a sort of classification model (mapping such numbers to “bad”, “medium”, “good” or something like this).

Now when I begin such a process, I do not even know how much information is really contained in my data to calculate such “predictions” (which are in fact “ex post predictions” in this case). Maybe all the features for that I find data do not even come close to the “real causes of death” and the whole analysis would be quite useless.

So, to set a benchmark, I would like to train a NN and see how far I can come. However, the result of a NN isn’t something that can be explained to a broader public. Especially, it does not give clear information about which features really had the most influence on the outcome. So, after seeing that I can produce good results with my NN, I would try to find some model that give more valuable information about what really happend. Some kind of regression model or decision tree, for example would give me information about influential features which also can be communicated.

Does that approach sound reasonable for such an example?

Best regards
Matthias

“Training an NN” is not a do-all solution. It’s just one possible method.
You need an independent means to decide how good is “good enough”.

I think you’re putting too much emphasis on the NN as a mythical gold standard.

Hi there,

I see, thanks for clarifying.
I believe in your example in fact it would make sense to train the DNN. At least as a benchmark, maybe it could also turn out that it is your favourable application for deployment. So it would be one possible method in your tool set!

Two notes in addition:

  • as N. Bohr said: Prediction is very difficult, especially if it’s about the future​:wink:. In your example, the strong distribution shift (of different C variants which might change in severity over time), need to be taken into account, but I get your point.
  • if you need to calculate feature importance there are ways how to do it, see this example for classic ML, but potentially also heat map or dropout approaches for Deeplearning applications can be useful.

Best regards
Christian

Hi,

a distribution shift in this example might result from several reasons. If there is a distribution shift caused by variants, is at least questionable (it’s a claim of virologists, not a fact). However, raising immunity in the population, especially by natural infection, is probably a highly influential factor. It would of course be desirable to include data about such factors (i. e. data about the dominating variants, serological data that concern immunity asf.).

I really wonder why there has been not collective effort in the data science community so far to produce such models. At least I found none on Kaggle.

Best regards
Matthias

Well, in lack of some other “gold standard”, I am just looking for something that might help to guide the process. I would not call it mythical. But I get your point.

1 Like

For guidance of the process along developing AI models the CRISP-DM model is worth to mention, see also this example as some practical guidance. Feel free to take a closer look at it since it is some standard for data mining since several years even though it might not be perfect.

With respect to NN as possible “gold standard” in addition to the previous discussion above, I recommend to take a look at the No-free-lunch-theorem:

In statistics, the term has been used to describe the tradeoffs of statistical learners (e.g., in machine learning) which are unavoidable according to the “No free lunch” theorem. That is, any model that claims to offer superior flexibility in analyzing data patterns usually does so at the cost of introducing extra assumptions, or by sacrificing generalizability in important situations.

Source

Happy holidays!

Best
Christian

3 Likes

Neural Networks are great. The are a global approximation function. That it in theory NN can superior results when compared with any other method given that you have the right amount of data and architecture.

The issue is you do not always have the right amount of data. Nor you always have the right architecture. Especially that NNs are data hungry. And require much more data than other methods. Sometimes the improvement does not justify the cost associated with the data collection. In other cases it is hard to train the NN this is especially true when the architecture gets really complex. Which again brings back cost/performance trade off.

One the other hand, other machine learning methods require less data, and in many cases the difference in the performance is negligible. In other cases machine learning methods can exceed the performance of NN, this has been shown in different cases when the data is tablure and structured.

the buttom line here, is that while NN are hotter and cooler and have seen a lot of attention lately especially with the spread of . Other machine learning methods are also as good

1 Like

Why do tree-based models still outperform deep learning on typical tabular data?

1 Like

Thank you @tsvika_greener!