I’m just starting in AI and currently enrollng in the course “AI4E”
I have a general question and I think by time and experience it will be more obvious.
The question is how to know which input parameter (A) need to feed when using ML algorithm to predict output (B).
In the example provided by Dr. Andrew he discussed the following:
to predict the demand on t-shirts in your shop , we need the price of t-shirt, shipping costs, marketing and material. How do we know that these inputs are what we need to ensure the best accuracy of our prediction. Not only for this example, but any ML project
The best person to answer that question is the SME (subject matter expert). Sometimes, we ourselves can take a guess if we have familiarity with the domain, like for example of the t-shirt, or trying to predict house prices as we have ourselves have been part of buying a house or t-shirt at one point or another, but if we go to other domains which we don’t have familiarity with like industrial applications, then we absolutely must rely on the SME.
If an SME isn’t available then next best thing is to check if there are existing research papers on this field. You’ll get an idea of what kind of features others have used, you could begin from there and improvise further.
Also, you can use techniques like feature selection to check which features are most relevant to the output. This might be an easier approach, esp. if you don’t have access to an SME but do remember feature selection techniques rely on statistical properties in the data. If you are using it then ensure you have multiple datasets to validate the approach, it shouldn’t be that the algorithm has picked on some stray property / noise which was specific to a given dataset but doesn’t generalize.
If sufficient data isn’t available (or dataset is small) then at-least ensure you are doing cross-validation (which you should anyway do) so that you may still have some degree of confidence.
I find the book written by Max Kuhn on Feature Engineering to be very enlightening for the purpose. The examples are in R but the theory provided is extremely useful.
Great explanation. Thank you very much. It is indeed helpful in putting a perspective and how to approach different ML projects. I came across feature selection and cross validation years ago. I will go over them again