I did not understand the following from the final video of week 3 regarding recall/ precision:
" Notice that picking the threshold is not something you can really do with cross-validation because it’s up to you to specify the best points. For many applications, manually picking the threshold to trade-off precision and recall will be what you end up doing"
The precision / recall curve is a way of visualizing how is your model performing. A more restrictive threshold will only predict as positive the ones with the most confidence (higher precision) but will leave behind some true positives as well (lower recall), and vice versa
@alvaroramajo Thank you for replying. I do understand what you wrote, but what I did not understand is the part in which cross-validation is mentioned. It is written that we cannot pick the point, but it is up to us to specify it. Is this sentence contradictory, or is there something I am missing?
Cross validation compares metric performance over models of different sets of hyperparameters. However, we should exclude “threshold” from such set of hyperparameters when the metric in question is precision or recall, because for example tuning the threshold down always increase recall.
You always achieve 100% recall when you set threshold equal to 0 even if your any other hyperparameters are completely non-sense. In other words, since reducing threshold always increase recall, nobody should care to tune other hyperparameters with the technique of cross-validation to achieve the goal of best recall.
We can divide metrics into 3 categories when threshold is concerned:
metrics that are monotonically increasing / decreasing with threshold, e.g. precision / recall
metrics that are independent of threshold, e.g. AUC (google “AUC Area under curve metric” for more)
other metrics
For type 1, we exclude threshold from cross validation. For type 2, tuning the threshold has no effect at all. For type 3, we can include threshold in cross validation.
Hi @rmwkwok -
Your discussion here brings up another question that I’ve been thinking about when it comes to cross validation for NNs. It makes sense why you wouldn’t change threshold when looking at precision or recall. But what if you are looking at accuracy and error. Varying alpha and lambda seems to be pretty normal, but what other inputs could be varied? It makes sense that anything that would change the weights that are learned could be a potential input to change, so for example, the number of epochs, the number of neurons in each layer, and the number of layers. Are those three appropriate to change, or are they something that people don’t vary during cross validation usually?
We compare different settings of them in cross validation, and besides them, we can also compare the choice of activation function, the choices of input features (e.g. we can add polynomial features), the choice of how we initialize neural network’s weights, and so on. This list of tunable hyperparameters should cover anything related to the training data, and the neural network itself.