Hi professors, tutors, and classmates, I just finished the part of “anomaly detection”. I have a bit confused of how to split the train and cv/test set. As my current understanding, the sets should be splitted as my follow mentioned. Please help me to see if my understanding is correct or provide me the correct form. Thanks
suppose we have some unbalanced data, 0 = normal, 1 = anomaly, and we have known some anomaly data.
- for training set, we train the data with features but no labels. the original label of train set can be 0 and 1, that’s never mind. (am I right???)
- for cv/test set, we train the data with both features and labels, including the 0 and 1 labels.