Multi-class classification with dynamic Class Dataset

Hi everyone,
I’m working on a multi-class classification problem where we need to predict a class from class pool dataset.

We have two dataset samples_dataset & class_dataset. The model iterates through all possibilities samples_dataset samples and all possible classes, then it calculates characteristics/features between one sample and all classes in the class pool dataset.

Using NN model doesn’t generalize well when we introduce new classes into the class dataset. Even though, we are not using explicitly the class as a feature…

RandomForest Classifier does well on training but poorly on unseen classes (new added classes).

In case you have worked on similar problems, please discuss and share your recommendations. Thanks.

I think your models need retraining before prediction is made on unseen classes!

No supervised method is going to recognize classes that it was not trained to learn about.

Thanks for your comments. Indeed, the model is able to predict on unseen classes. The use case is about features calculation between all possibilities then predict the most probable class for each sample. Let me give you an example, samples_dataset contains m examples or samples. The data preparations process first calculates features for each possible combination sample_class and I’m labelling each pairing with 1 or 0 (1 means the assigned class is correct-supervised learning where class is known for each sample). When I use NN on less challenging dataset (means learning is easier), the model is able to predict with good accuracy on unseen data. However, when dataset is a bit challenging in terms of learning, the accuracy drops.

When using RandomForest, the model is doing well on seen data and poorly on unseen data (the model still predicting for many samples a classes that were not part of the training process but accuracy isn’t good enough)…

We’ve already discussed why this is happening.

Did you try using the class as the label, and letting the NN learn how to predict it directly from the input features? Not clear to me why turn it into a binary 1/0 classification. I missed something.