How to properly set jensen shannon divergence / infinity_norm to leverage TFDV drift/skew check features

datapug · April 12, 2022, 7:26pm

The courses in this specialization shows how Tensorflow data validation offers the capability of checking data skew and drift and mention that “Setting the correct distance is typically an iterative process requiring domain knowledge and experimentation”

How can one specify an initial reasonable jensen_shannon_divergence threshold (or infinity_norm one for categorical features)? Is there some python package/utility/code that one can leverage on a given data set feature to compute a reasonable threshold?

If not, what is recommended to proper conduct the experiment and find the most appropriate threshold for a given feature in a dataset?

Why was jensen_shannon_divergence prioritized over other approaches to measure statistical distance?

balaji.ambresh · October 16, 2022, 1:00pm

A reasonable estimate depends on the acceptable variance of the model performance.

Some features aren’t as heavily weighted as the rest. Model performance won’t change much even if less important features change a bit. So, you can set the L-infinity norm threshold high.

On the other hand, it’s safe to retrain a model when an important feature changes a lot. So, set the infinity norm threshold low for valuable features.

Goal is to minimize compute resources i.e. retraining a model when not necessary. This is why the process is iterative and requires experimentation to figure out acceptable thresholds.

Topic		Replies	Views
Gradient Check Error Threshold - Theory Improving Deep Neural Networks: Hyperparameter tun coursera-platform	3	566	May 1, 2022
C3_w1_anomaly detection Unsupervised Learning, Recommenders, Reinforcement week-module-1	12	572	December 17, 2024
DLS Course 2 Week 1 - Gradient Checking Implementation Improving Deep Neural Networks: Hyperparameter tun coursera-platform	4	529	October 17, 2022
Grad check threshold Improving Deep Neural Networks: Hyperparameter tun coursera-platform	3	587	April 20, 2021
Week 1 anomaly detection -- pass with 100% but it still isn't right Unsupervised Learning, Recommenders, Reinforcement week-module-1	1	265	December 22, 2023

How to properly set jensen shannon divergence / infinity_norm to leverage TFDV drift/skew check features

Related topics