I am training a neural net to classify the coffee roasting data. I am using 2 layers with 3 neurons for the first and 1 for the final layer. The layers use ReLU and sigmoid activations, respectively. The network isn’t classifying correctly and I just want to confirm my intuition that it ought to work. My understanding is that because the data can be classified using 3 decision boundaries that this architecture is sufficient. I just want to verify that is so.
Hi @Jeremy_Epstein, I believe you are referring to the data used in one of the optional labs. The optional lab also used 3 units and 1 unit in the two layers, but sigmoid in both. So, here, your change is not about the number of neurons or layers, but the activation functions.
I suggest you to start from that lab, choose only one thing to change, observe the difference in performance, make hypotheses if it becomes worse and experiment what other changes are needed to make it better.
A hypothesis would be like “given that I have only changed the activation in the first layer from sigmoid to ReLU, the performance is worsen, and that could be due to …”
If you have no idea on what to do, I suggest you to make a list of things you can tune, and start playing with them, and if something does make any improvement, then see if you can make sense of it. If you can make sense of it, then you can fill up that last missing statement in the above hypothesis template.
I copied the lab code and started comparing it to the code I had. In my code I did not perform the tiling step, which multiplies the size of the training data by a factor of 1000. Tiling my data made the model predict properly. An interesting alternative I found was to train the model for 1000 times more epochs and that also got the model to correctly predict. But notably, this was less time-efficient.
It seems my original model did not train on enough samples.