Week 4 A2 - Train on different data set

I recently completed a machine learning course and decided to apply my newfound knowledge by experimenting with a different dataset. I used the “animal 10n” dataset, which contains human-labeled online images, to train a MLP model. Despite starting with the same parameters I used during the course assessment, I encountered perplexing results.

Details:

I selected 209 images from the “animal 10n” dataset for training, maintaining the same MLP configuration: layer dimensions [12288, 20, 7, 5, 1], a learning rate of 0.0075, and 2500 training iterations.

Here are the accuracy results I obtained:

  • Training set (a10n): 99%
  • Test set (a10n): 58%
  • Test Assignment set: 56%

As you can see, the model’s performance on the test sets is not satisfactory, barely surpassing a random guess. To address this, I experimented with increasing the number of perceptrons in the MLP, resulting in the configuration [12288, 32, 8, 4, 1], which significantly improved accuracy on the test set:

  • Training set (a10n): 99%
  • Test set (a10n): 85%
  • Test Assignment set: 34%

However, the model performed poorly on the test assignment dataset, registering only a 34% accuracy rate. I’ve attempted to fine-tune hyperparameters and adjust the size of the training sets, hoping that the “animal 10n” dataset would enable the model to learn a more general pattern applicable to the assignment data set. Unfortunately, my efforts have yielded consistent accuracy figures of around 34-36%.

I’m reaching out to the community for guidance on how to train a more general model that can achieve higher accuracy on both datasets. Any suggestions or insights would be greatly appreciated.

I think perhaps this thread could be moved to the “AI Projects” forum area, since it isn’t directly applicable to a specific course.

If you get 99% to 100% on the training set, you’re overfitting. That’s the first problem to fix.

The model is too complex, or you don’t have enough data, or you need to add some regularization or dropout.

That will only make overfitting worse.

I would suggest from another angle:

Instead of keeping beating up MLP, which may / may not give you a better result, you can consider NNs of other architecture, for example, CNN. CNN should give you good improvement in processing images.

If you get 99% to 100% on the training set, you’re overfitting. That’s the first problem to fix.

I thought the higher - the better, during the assignment I got 98% on training data set, is that overfitting as well? Is there any way to prevent overfitting ? Should I constantly check that accuracy against training set is lower that some boundary and if so stop training?

That will only make overfitting worse.
So the more perceptrons I have - the more specific function in outcome, that makes sens, TY!

The model is too complex, or you don’t have enough data, or you need to add some regularization or dropout.

I know nothing regarding regularization and dropout so that is a learning oportunity for me :slight_smile: Also, maybe simplier model will perform better, worth trying.

Thank you a lot for your suggestions!

That’s a really good advice, I’ve intentionally picked MLP to use the same approch I’ve been tought during the course :slight_smile:
According to the articles CNNs should significantly outperform MLP for image classifications.

Yes. It’s the topic of Course 2 Week 1. Concepts include Regularization (L2 or Dropout), or getting a larger training set.

Look at it this way. There is no reason to overfit the training set, because we already know all of its labels.

1 Like