Normalizing the mixed video datasets

Yakup_Abrek_Er · April 19, 2024, 11:33am

Hi there, I am trying to mix two video datasets to train my model but they have different mean and standard deviation values. From my previous experiments, I know the min-max scaling yields sub-optimal results so I want to apply Z-Score normalization but I don’t know how to determine the mean and std values. I couldn’t find the best practice on the internet, could you help me or share related resources? I appreciate your help, thank you in advance

TMosh · April 19, 2024, 3:01pm

Are you certain that normalzing the video data is going to be useful? I don’t have direct experience with this, but it seems to me that the data might already be sufficiently normalized through the video editing process.

Yakup_Abrek_Er · April 19, 2024, 3:37pm

Hi, thanks for the answer. Do you mean only applying Min-Max scaling or applying no scaling at all(values between 0-255)? The Data has a normal distribution but it is not a standard distribution. For more context, my model takes a video sample and predicts a scalar value.

By the way, pre-trained models on torchvision also apply normalization to the video data as a preprocessing step (Link).

TMosh · April 19, 2024, 4:06pm

If you know the max value is 255, then divide everything by 255 and it should be fine.

TMosh · April 19, 2024, 4:06pm

You don’t really need a mean value of 0.

Yakup_Abrek_Er · April 19, 2024, 4:40pm

Yes, it is a choice but it gives worse results compared to normalization with the training split statistics in the single dataset setting. I guess I need to experiment with different settings. Thanks

TMosh · April 20, 2024, 3:03am

When you change the normalization, you might also need to change the layer weight initialization method.

gent.spah · April 20, 2024, 6:10am

Also and I am thinking now, you say that normalization has been applied, if you find out what normalization has been applied to each then reverse for one dataset and make it similar to the other one!

Yakup_Abrek_Er · April 20, 2024, 8:48am

I use pre-trained weights to start but I can test it out too with random weights. I didn’t know it might be necessary, thanks

Yakup_Abrek_Er · April 20, 2024, 8:55am

Z-score normalization is applied and I have the statistics for both of the datasets. If I normalize every dataset individually, I will have apparently good results but afterward, my model will receive samples that come from various distributions that I can’t decide the statistics beforehand. Maybe, I can train the model as you said and normalize the validation split sample-wise to see the real performance.

gent.spah · April 20, 2024, 10:19am

You can always build a pipeline of code to normalize the incoming data in real time, according to your models demands and this is actually done in Machine Learning models in production.

So my idea is, first you find out how they are normalized, then you denormalize if needed and then unify the normalization process for all incoming channels. This is theoretical anyways.

altafkhan · April 23, 2024, 10:22am

Combine the 2 datasets and normalize using the mean/std-dev of the combined dataset. Use the mean/std-dev of the combined dataset to also normalize the test data and the data you receive during the production run.

Yakup_Abrek_Er · April 23, 2024, 12:06pm

Yes, I am planning to do it like this. I couldn’t find much material regarding this subject

TMosh · April 23, 2024, 3:17pm

Which Machine Learning courses have you attended? Normalization methods are covered in many of them.

Yakup_Abrek_Er · April 23, 2024, 3:49pm

I just couldn’t find what is the best practice to “merge” two or more image/video datasets with different data distributions as I stated in the question. If you know any course or material that specifically and objectively explains this concept I would appreciate that. Of course I already know about and tested the normalization methods we mentioned here but they are not satisfactory or applicable as I explained in my previous comments.

TMosh · April 23, 2024, 5:00pm

It’s not really feasible to create a model that will give optimum results on two different datasets that have fundamentally different statistical distributions.

Topic		Replies	Views
Question about rescaling Supervised ML: Regression and Classification week-2	4	503	July 7, 2022
C3_W2 - Practice Lab 1: Mean Normalization Unsupervised Learning, Recommenders, Reinforcement week-2	6	567	August 24, 2022
Standardization & Sigmoid function Neural Networks and Deep Learning	2	562	April 21, 2022
Confusing on normalisation Improving Deep Neural Networks: Hyperparameter tun	9	601	July 23, 2021
Normalization before a new prediction? Supervised ML: Regression and Classification week-2	3	528	August 27, 2022

Normalizing the mixed video datasets

Related topics