Document Classification

Hello, I am starting a project to classify whether a PDF document is a CV or not. Do you have any tips or suggestions for a beginner in the field?

That type of problem, with one positive and one negative outcome, is called binary classification. Make sure the components you select for the model architecture are suitable- eg not every loss function is. Also, use what is called a ‘balanced’ training set - somewhat equal numbers of positive and negative examples.