I’m trying to use a AI model to predict a variable that has 12 classes, and has 14 features and have 144470 training examples. The model start right with a value of 0.65 as AUC and doesn’t improve more after that. There is any rule of thumb that I’m not considering.
Yes. I started with one dense layer with 32 units and it reachs the top of 0.65 o AUC very fast. I run for more that 1000 epochs and stop learning around the 0.65 of AUC. I tried also different batch sizes. Now I’m using 128 which made the NN reach the better value of AUC faster. Thats why I started to test with more layers and dropouts.
When you only have 14 input features, I think any model is going to struggle to train very well when you are trying to learn 3 million parameters based on only 144,000 examples.
Dropout is used to avoid overfitting. You don’t have overfitting (you’re struggling to get high training accuracy). So I recommend you not add any Dropout layers until you get some overfitting.
Making the model more complicated isn’t helping you get better predictions - because you get the same AUC for both a simple model and a complex model.
You didn’t mention what activation function you’re using. Hopefully you remembered to one-hot code the output labels.
Rules of thumb:
Start with one hidden layer, using sigmoid() activation.
The size of the hidden layer could be either:
the square root of the number of input features,
or the average between the number of input features and the number of output labels.
Once you get this working as well as you can, then try adding one more hidden layer (with both hidden layers having the same number of units).