How to approach a Data Science problem?

I want to ask as to how to approach a Data Science or Machine Learning problem, a problem about which one doesn’t know much about? Like let’s take a medical disease issue or some sports or such problem. Now, how could one know as to what features will be essential? I know that feature engineering is an essential skill but what if someone is gathering the entire data from scratch and isn’t using publicly available data for various reasons? How to know apriori as to what type of data or features could play a key role in solving the problem?

I recommend you get assistance from someone who has knowledge about the subject you are studying.


What @TMosh has advised is the first and the most crucial step, then once you have data you can perform feature analysis to find which data contributes most to the output! Kaggle free tutorials give some information on that process.

1 Like

if its a regression problem, using l1 and l2 regualarization can help to automatically filter out the important features from the less important ones

1 Like

Regularization will help eliminating the features that don’t contribute much for the prediction of the label.

1 Like

you mean help from someone well-versed in that field within the organization or just from any expert in general?

yeah regularization is helpful in almost all the classical ML problems, for that matter, as far as I understand. Even in XGBoost regularization helps a lot. Same with dropout for neural networks.

An expert in that field.