Regarding the question of when to go for classic ML or deep learning, I am also afraid there is no crystal clear decision border, but in general points to consider are:
- the more unstructured your data is (e.g. images, video, text, …)
- the more complex and abstract your problem is (e.g. face recognition in a video sequence)
- the bigger the data (where you hopefully have high quality labels)
… the higher the potential of Deep Learning should be since DNNs w/ advanced architectures (like transformers but also architectures w/ convolutional | pooling layers) are designed to perform well on very large datasets and also process highly unstructured data like pictures or videos in a scalable way: basically the more data, the merrier!
Compared to classic ML models, DNNs possess less structure and can learn more complex and abstract relationships given a sufficient quality and quantity of data, see also this thread.
Hope that helps!
Best regards
Christian