Data transformation

abdou_brk · November 28, 2023, 5:08pm

hello everyone,
We’ve seen that in case data are not normally distributed we have to perform a transformation so that we have a normal distribution to fit it to our anomaly detection model, but i don’t understand how this can works always because the transformation may cause some loss in the original data . Here is what i mean , suppose we have to perform the following transformation x to x^2 in order to get a normal distribution and suppose a data point x1=1 is considered as normal one while x2= -1 is considered as an anomalous datapoint.In this case we will end up with x2 as a normal example which is wrong.

TMosh · November 28, 2023, 5:20pm

That is an example where the transformation you used is not appropriate for that set of data.

rmwkwok · November 28, 2023, 9:02pm

Hi @abdou_brk,

Then you can try to add a constant to shift the range to positive, for example, add 2 to change it from a range of (-1, 1) as you exemplified to a range of (1, 2).

In the positive range, transformation by x^2 is monotonic, so it will only change the shape of the distribution without reordering the samples.

Cheers,
Raymond

Topic		Replies	Views
Transforming feature distributions to Gaussian/Normal Unsupervised Learning, Recommenders, Reinforcement week-1	1	432	June 9, 2023
Distribution of the data Supervised ML: Regression and Classification week-3	2	614	July 27, 2023
Anomaly detection: do you apply transformation on new samples? Unsupervised Learning, Recommenders, Reinforcement week-1	2	489	August 15, 2022
Rescaling methods for outlier data Supervised ML: Regression and Classification week-2	1	490	July 2, 2022
Data distrubutions AI Discussions ai-discussions , data-centric	4	93	February 9, 2024

Data transformation

Related topics