Hello! I have encountered a problem in my hands-on project regarding the datasets of deep neural networks
According to the video in DLS course, I am tackling a problem that requires me to perform an analysis on different subsets (or sources) of the same kind of data.(Kind of like the example of web-page cats (large set) and consumer camera cats(small set)). Is it sensible for me to do the following?
Step 1. Inject some consumer cats examples to my training set which is primarily web-paged cats
– so that the NN can learn with a bigger dataset to achieve higher accuracy
Step 2. Use a higher percentage of consumer cat in my validation and test set
Or:
Step 1. Inject some consumer cats examples to all of the data sets then split the train-dev-test sets randomly at a reasonable percentage?
Personally I have found the second step better for performance in the test set, but I want to make sure that it is logically sound.
Thank you in advance,
Yuhan Chiang