(Self tinkering on the exercise)
In 2.3 - Test the Model
The text explains that the accuracy metric is a poor metric in this case because the labels are scewed.
It suggests that one should use F1 or PrecisionAtRecal (or that is what I think is meant with precision/recal) but then says we won’t bother with it, which triggered me to do it anyway x)
The build in Keras function:
https://www.tensorflow.org/api_docs/python/tf/keras/metrics/F1Score
Does not seem available. (I guess a newer version of keras would need to be installed)
I found this code:
To self build the metric. I am unfortunately not knowledgeable enough yet to know if this is a correct implementation but it is accepted and has some upvotes. I also used PrecisionAtRecal as this is available from keras directly in this version as another reference point to compare against the non validated F1 code
I tried both this and PrecisionAtRecal on the model.
During the additional training epochs I get an:
F1Score of 0.62 - 0.65
PrecisionAtRecal(recal=0.8) of 0.48 - 0.6
PrecisionAtRecal(recal=0.5) of 0.73 - 0.8
However on the dev set I only get the following:
F1Score of 0.23
PrecisionAtRecal(recal=0.8) of 0.20
PrecisionAtRecal(recal=0.5) of 0.30
The scores seem quite low so did I mess up somewhere?
If not is it correct to interpret these results as:
- The model is over-fitting on the extra testing epochs (maybe due to the small dataset or the synthesized extra data we created already being part of the larger dataset used to pre-train the model)?
- The accuracy is indeed very misleading because the model is not doing that well according to the other metrics on the dev set?
The empyrical results we look at however did fairly well.
It also did quite well on my own test although I had little background noise
These 2 would indeed indicate the model is doing quite well, so that leads me to believe I may not be understanding the metrics correctly?