Classical machine learning baselines in the era of deep learning?

Hi there.

Q: How often do deep learning practitioners today start their projects with simpler, classical machine learning algorithms (like random forests or logistic regression), and move to deep nets later (if needed)? Or is it more typical now to start with neural nets as a baseline model?

For sure it depends on the problem – the kind of data, problem’s complexity and novelty, interpretability requirements, deployment constraints, etc. And the practitioner’s comfort with classical vs deep frameworks.

But Andrew’s compelling points make me think that these days it’s more often less of a headache (ultimately) to start with a neural net as a baseline?

More context for my question
Take even a multiclass text classifier, like the typical toy problem of the 20 Newsgroup Classifier.

I’ve heard mixed advice on this. On a very similar problem, a metadata classifier, I was told by a high-level ML practitioner I should start with classical algorithms like RF, logistic, naive bayes, etc. “That will probably be all you need.” I have seen how you can get a great baseline with a random forest with mostly default hyperparameters. And deep learning seems to introduce more complexity (-> more debugging) and obviously computational requirements (-> more overhead) than some Scikit style approaches. So starting classical (‘simpler’) makes sense.

But Andrew’s great course makes me unsure about this strategy now, for two reasons:

  1. He makes a compelling case that the workflow is actually simpler with deep nets because they make it easier to orthogonalize the workflow vs with classical algorithms. So while a DNN may require more startup time and overhead, the actual optimization of bias and variance could be easier and more straightforward. It makes the process sound more certain, more purposeful, and even a little calmer than the classical workflow I’m used to. That’s VERY appealing!

  2. If you start classical (with say a random forest) and later want to try deep learning, is that transition from classical to DL awkward? How does that look in reality? Does a lot of your work to date not apply to deep learning iterations, and in some areas you have to start over with data preprocessing/transformation iterations, bias and variance optimization, etc? And do you have to rewrite a lot of workflow code from scratch (Or do those scikit wrappers for Keras, for example, allow you to stick your DNN in the same CV and evaluation workflow you were using for your classical algorithms, with pretty good reliability)?

I’d appreciate advice and modern perspectives on this. Thank you!

Hey @cparmet ,

as you have already pointed out, a task would determine a choice of the model. That would be perfectly fine to start with classical ML and then switch to DL, or back :slight_smile:

The rule of thumb is to implemented baseline model fast, because once we have the model, we have something to compare with.

That’s helpful, @manifest , and the rationales make perfect sense. The example of switching back to classical is instructive too. Thanks for your insights!