When added only a single dense layer in a sequential model the accuracy was more than 90%.
But as I added more dense layers with different numbers of nodes…the accuracy drop to 50% and remained constant…what could be the reason behind it??
The problem is definitely in the 7 neuron dense layer, thats obvious. I am thinking because the flatten layer has many neurons and all that info has to pass- condensed through only 7 neurons, there is not much room or capability for them to learn and convey complex adaptions. And also relu trims all negative values.
See if you increase the size of that dense layer what happens to the accuracy.
But having only a single neuron for the output also has the input from a large flattened array but still it successfully gives good results?
There is no guarantee that any particular change to a network architecture is necessarily going to be an improvement, right? But I agree that in this case it seems plausible that it shouldn’t make things that much worse. Maybe you’ve got a problem with other hyperparameters like how long you run the training and what optimization method you are using. If you are adding layers to the network, maybe that doesn’t work if you just leave everything else constant.
You should also experiment with some of the points that Gent made. E.g. maybe ReLU is a bad choice because of the “dead neuron” problem. Try tanh in the non-final Dense layers and see if that makes a difference.
+1 on @gent.spah’s comments regarding the impact of the new Dense layer. I would say an acceptable rule of thumb is if you increase the number of parameters to be learned but don’t increase the number of training passes (iterations, epochs, whatev) then accuracy could go down. The more params, the more time it takes to learn them all. If it were me, and I wanted to see what additional layers did to accuracy of image classification, I would try adding groups of Conv+BatchNorm+Relu+MaxPool instead of adding more Dense. Then adjust training exposure. If you do more experiments, let us know what you find out?
I tried to use the tanh but still, nothing worked…But when I increased the number of neurons to 50 in the Dense layer before the output one the accuracy jumped up.
@ai_curious I also think that increasing iterations could be helpful in better adapting the weights of the new added layers.
I’ve tried that but my accuracy remains constant ie 50%
Those 7 dense neurons before the output are not the same as the 1 output neuron, i would like to think of the output neuron as a tap open/close, not much processing happening there. But the 7 neurons are part of the processing pipepline.
ReLU units are really inefficient, since you don’t get any precision in the gradients. It’s either 0 or 1.
So when you use ReLU layers, you need to use a lot more units than if you had sigmoid() activation.
Agree completely. I haven’t done the experiments myself to quantify with data, but my intuition is that anything that increases the number of trainable parameters - more layers, different layers, same layers with different shape parameters - merits additional training.