Multi output regression problem

ahujan · March 19, 2024, 6:23pm

Hello,

I am new to this forum.

I am running an issue in the multi regression multi ouput DNN problem. To be precise 12 independent features and 9 dependent features (targets) total of 33, 600 dataset, in short total dataset = (33600, 12) (33600, 9).

Since the output features have the wide range for my application problem
[-13818.05 , -4.845e-11], they are normalized before training.
With the architecture, the model seems to be performing well for the normalized data resulting in maximum percentage error of around 1.2%, but once they are denormalized with the referenced min max values, the percentage error is enormous. I am attaching the ranges of ground truth, prediction, percentage error for both normalized and denormalized data for reference.

Range of normalized Ground truth
[ 0.4929172429487244 , 0.9999999999999984 ]
Range of normalized predictions
[ 0.49188894 , 1.009478 ]
Range of denormalized percentage error
[ 1.2469393977925297e-06 , 1.2643524127114114 ]
Range of denormalized Ground truth
[ -13818.049816278619 , -4.840588041497504e-11 ]
Range of denormalized predictions
[ -13846.071 , 258.2756 ]
Range of denormalized percentage error
[ 0.0001384670821255034 , 14931722713440.225 ]

P.S : I have tried even the Standardization also for scaling but the results were worse than the Min-max Normalization.

Further, feel free to ask if any details you would like to know.
I would really appreciate the suggestions.

Regards,
Naresh

TMosh · March 19, 2024, 7:13pm

Normalization will usually involve getting a zero mean and dividing by the standard deviation.

Min-max scaling is not effective in a lot of situations.

Another option would be to use a form of compression using logarithms.

ahujan · March 20, 2024, 5:46pm

@TMosh

Thanks for your response.

I tried z2 score standardscaler method using the sklearn library but the results were worse than min-max normalization.

I thought of considering the log normalization too as the range is wide, but as you can see the dependent feature range is all negatives [-13818.05, -4.8406e-11].

Since the log of the negative data can’t be taken, is there any preferable method to convert it to the non-negative range to take log (avoiding the data leakage of train and test data), which would result back to original data after denormalizing?

TMosh · March 20, 2024, 6:13pm

If all your output values are negative, you can multiply them by -1, then use log().

Topic		Replies	Views
Output normalisation in multi-output regression NN AI Discussions	5	324	December 17, 2022
Multiple Linear Regression - Housing Prices Problem (cost too high) AI Discussions	9	336	September 29, 2022
Can someone help explain this line Supervised ML: Regression and Classification week-module-2	8	454	July 27, 2023
Data input normalization Improving Deep Neural Networks: Hyperparameter tun coursera-platform	5	612	April 29, 2021
D-normalization of Data Advanced Learning Algorithms week-module-2	9	561	October 4, 2022

Multi output regression problem

Related topics