In the course, he mentioned that if you don’t have enough training data, traditional algorithms like SVM could be better than CNN and it depends on the hand crafted features.
I want to know what does he mean by hand-crafted features and anlso if traditional models can be truly more efficient, especially for object detection task?
Hi there,
hand-crafted feature means that you incorporate your domain knowledge into your modelling with the features as model input. This could mean that you:
- apply some mathematical operations like transformations (e.g. Fourier transform if you have steady states and an oscillating system) or just simple addition / multiplication / raise to higher power etc.
- or you could even use more sophisticated models whose output can serve as input for your ML model
What is better for you really depends on the task you want to solve. Here you can find an example where a simple model might be sufficient to model the expected value if a time series problem: Bias and variance , tradeoff - #2 by Christian_Simonis
Models with hand crafted features are often powerful in my experience if you have dimensional spaces of <18 and quite good domain knowledge (in your features) with at least some moderate amount of data to satisfy your model needs.
In object detection you also have some classic models like these from OpenCV: OpenCV: Other tutorials (ml, objdetect, photo, stitching, video)
But in my understanding, hand-crafted features come to its limits, especially if you want to reuse a model (e.g. with transfer learning) in a slightly different context or application scenarios.
So deep learning (DL) is a very powerful way for object detection: you have (due to the characteristics of a picture or video) highly dimensional spaces and tons of good training data. Also the right model architectures were developed in the past years to solve this kind of task. DL models basically take care of „feature engineering“ implicitly in their more sophisticated layers in order to optimise the underlying cost function. Good thing is: you will learn all this in the DLS specialisation.
Long story short: in object detection I am not aware of a robust and highly scalable solution which is purely based on classic ML methods with hand-crafted features AND outperforms DL-based models.
Best
Christian
Thank you for your response.
Best.
Thank for a nice explanation, Christian!
Could you please tell - could machine-crafted features have some “observable meaning” (e.g. number of rooms in house pricing prediction) or it will be mostly some artificial mixture of hand-crafted input features?
My best regards, Vasyl.
Hi @vasyl.delta
I guess you are referring to this:
Features learned by DL often lack interpretability and explainability and often do not have a physical meaning or a direct domain meaning, e.g. also embeddings in some hidden layers.
As a theoretic example for DL:
… the model could learn how edges and contours make a „paw“ or „whiskers“ or other features that are important to identify a cat and low level features like edges are hierarchically combined and enhanced to describe more advanced patterns to finally form objects, which contribute to the classification if we see a cat on the picture or not.
see also this thread: New 1000 images after model development (train/dev/test), where to add? - #11 by Christian_Simonis
In reality, things are not always so good to interpret when it comes to DL and in general the explainability of hand crafted features with physical meaning is usually clearly higher.
Also, this thread might be relevant for you:
Best regards
Christian
Thank you so much, Christian! Very clear answer. Will look through it.
If possible, one more question on this topic.
Consider an example when we predict the price of the house and the input parameters are width and length of the house. Will it be possible for deep neural network to generate parameter “square of the house”=width*length? I hardly imagine how it could be done using standard activation functions like Relu etc.
Could you please clarify my doubts?
My guess is that it wouldn’t really need to do that. There should be some pretty strong correlation between the length and width of a house, meaning that it’s very likely that a house that’s very long, is also at least relatively wide. Also the network sees all inputs at each neuron of each layer, so any of the neurons at the first layer that are getting both the length and width as inputs could learn high weights associated with those.
But you as the architect of the network also get to choose what is contained in your input data, right? So if you are ingesting real estate listings, notice that those always give the square footage (area) and the lot size (also an area), but I don’t recall offhand whether they give the individual dimensions of the houses. Or if your data does not include that, then you could create area as a “hand engineered” feature by doing the multiply in your data preprocessing step. Of course total square footage is a little more complicated than that in a multi-story structure.
Thank you very much for the answer!