The courses in this specialization shows how Tensorflow data validation offers the capability of checking data skew and drift and mention that “Setting the correct distance is typically an iterative process requiring domain knowledge and experimentation”
How can one specify an initial reasonable jensen_shannon_divergence
threshold (or infinity_norm
one for categorical features)? Is there some python package/utility/code that one can leverage on a given data set feature to compute a reasonable threshold?
If not, what is recommended to proper conduct the experiment and find the most appropriate threshold for a given feature in a dataset?
Why was jensen_shannon_divergence prioritized over other approaches to measure statistical distance?