Adding Features or more Data Collecting?

Hi,
I am confusing about the adding features method vs data collection. I don’t know the differences between these concepts.
Could you please give some explanations?

Thank you,

Nhan

Hi Nhan,

adding features means adding extra data points for each observation, e.g. by adding mathematical combinations of existing features that are more meaningful to what you want to predict. E.g. in the context of a credit application, where you want to pedict credit defaults and already have the monthly income and monthly spending of an applicant, you might also add new features like the ratio and the difference of the two and thereby generate two more more meaningful features. It can also be achieved by adding new data sources to each observation by adding external data that is related to the observation.
More data collection would mean extending the data history by adding newer or older observations or adding data that was collected under different conditions or in a different context.
Better features give you models with higher predictive power, whereas adding data can make your model more robust to changing conditions.

Hope this helps,
Nils

1 Like

Nice explanation @Nils,

I would also establish baseline, this can help pick just some useful features.

Hi Nils,

Thank for your nice explanation.

However, I am wondering if it is possible to add more features in unstructured data (e.g image) without collecting more data. Have we only have one solution of augmenting data (rotate, brightness, contrast, etc…) for improving prediction so far?

Thank you so much,
Nhan

Hi Nhan,

transforming the images to generate new images is the main method for image data that I know.
One can also use feature descriptors like HOG, SIFT and SURF that one could see as a kind of feature engineering and that compresses important image information into a data vector.

Nils

1 Like

Hi Nils,

Thank you very much.
Because I cannot collect more image data, so I have to find some ways to increase my dataset. Your answer is my expectation.

Best regards,
Nhan