Machine learning model

Christian_Simonis · August 12, 2023, 1:53pm

Nope, @Nimish_Khandelwal.
If the targets are labelled correctly we should not manipulate them by clipping or winsorizing them. But of course you can consider transforming the whole labels (e.g. with log scaling), training the model to predict your transformed label, and then revert the transformation to get to your actual label.

I understand the labels are scattering which cannot be explained by the model, but that is the reality then.

Can you show how the model residuals look like?
(See also this thread: True vs predicted values biased-intercept - #4 by Christian_Simonis)

In general points to consider are:

is the scattering of your label caused by the fact they they emerge from different distributions (e.g. let’s assume the insurance premium might be systematic different for a certain kind of characteristics) in that case you could consider to train a separate model for separate labels (e.g. insurance premium for car owners that just got their drivers license and one insurance premium for more experienced drivers)
try to incorporate domain knowledge into your features w/ feature engineering see also this thread: Time Series Linear Regression
dependent on your residual analysis consider other models than a linear model (which will only perform great if you can manage to model all the non-linearity in your features). Maybe Gaussian processes might be worth a look since they also allow to account for uncertainty / confidence and model non-linearity, see also this thread: Deep learning is a small part of ai - #6 by Christian_Simonis

Hope that helps, @Nimish_Khandelwal!

Best
Christian

Topic		Replies	Views
Extra info about week2 - Cleaning up incorrectly label data Structuring Machine Learning Projects	5	541	December 29, 2022
Question: week 1, steps of an ML project -2.30 min Introduction to Machine Learning in Production	2	584	May 17, 2021
To Regression or To Classify AI Discussions ai-discussions	2	154	April 27, 2023
Doubt in Feature scaling Supervised ML: Regression and Classification week-2	7	576	November 5, 2022
What will be good machine learning algrothim for this distribution AI Discussions ai-discussions	11	135	May 28, 2023

Machine learning model

Related topics