Hi network
I have a question regarding building a data set for a machine learning project. If there are duplicate sample responses (i.e., the same measurement performed twice on the same sample), how can we utilize this data effectively?
Training Set: include the duplicate sample responses in the training set to help the model become well-tuned and robust within the distribution of the training data.
Cross-Validation Set: to provide additional insights into how well the model is performing across different folds of the data.
Test Set: train the model on one replicate and then evaluate its performance on the second replicate.
Thanks a lot for any feedback.
I enjoy a lot the course.
Best
Julie