Thank you. So the goal of the exercise is to identify clusters of blue and red dots. Correct?
Is the fact that they are in a “flower” shape irrelevant for the algorithm? Would the algorithm work the same way if the dots were in any other shape?
@paulinpaloalto
Thanks for clarifying my questions. The exercises in the assignment went smoothly, and I really appreciated the opportunity to try out other shapes of planar data! It helped me visualize the various outcomes of the neural network we built.
I thought it was interesting that “noisy_circles” with 50 hidden units had an accuracy rate of 48% which is about as bad as the logistic regression (47%), and much worse than the 20 hidden units (81% accuracy) and 5 hidden units (75.5%). If you have any insights behind this outcome please let me know.
Hi, @Nagyka. Interesting. I’m a little worried that something may be wrong in your code. I tried the “noisy circles” test and here are my results:
Accuracy for 1 hidden units: 59.5 %
Accuracy for 2 hidden units: 73.0 %
Accuracy for 3 hidden units: 79.0 %
Accuracy for 4 hidden units: 75.5 %
Accuracy for 5 hidden units: 79.0 %
Accuracy for 20 hidden units: 78.0 %
Accuracy for 50 hidden units: 79.0 %
If you look at the graphs, they look pretty similar at least for the core part of the dataset. So n = 50 gives essentially the same results, but at hugely greater training cost. My conclusion is that with this particular problem 4 or 5 neurons seems to be optimal for all these datasets. The other thing you notice is that the training cost just goes through the roof with 20 and 50 neurons. So it takes a lot more work to get results that are the same or only very slightly better.
Here are the plots that I got for the “noisy circles” case:
These test cases are different than what we usually do in that there is no “test” dataset to constrain the behavior and thus no such thing as “overfitting”. So you’d think in principle that a more complex network can do better. Some other experiments that might be interesting would be to try training for longer with the n = 20 and n = 50 cases and seeing if the behavior changes. Then the question would be whether adding another layer would give yet better results at much lower training cost. E.g. try two hidden layers with maybe 6 and 4 neurons and see how that compares to 20 or 50 in the single hidden layer case.
@paulinpaloalto
Thank you so much for your insights and for including the graphs.
I appreciated your discussion of multiple layers with fewer neurons versus fewer layers with a larger numbers of neurons. And then there is the cost tradeoff.
I re-ran the model for noisy circles and this time I got accuracy scores without anomaly for 50:
Accuracy for 1 hidden units: 62.5 %
Accuracy for 2 hidden units: 73.5 %
Accuracy for 3 hidden units: 82.5 %
Accuracy for 4 hidden units: 82.5 %
Accuracy for 5 hidden units: 79.5 %
Accuracy for 20 hidden units: 82.5 %
Accuracy for 50 hidden units: 82.5 %
I am not sure what caused the glitch before. My graphs are similar, but slightly different than yours. I assume the difference could be caused by the random initialization.
The initialization routine sets the random seed, so (unless you modified that code) that can’t explain the difference. Of course you’d never do that in a “real world” application, but they do it everyplace here to get reproducible results for the grader test cases. Notice that you got higher accuracy than I did by 3.5% at 50 neurons, so something else must be different. One guess is that you use the “real” training code that invokes nn_model with 10k iterations, whereas I used the “tuning the hidden layer” code in Section 6 that uses only 5k iterations. Or you must have modified something else about how the code is actually run.
I ran the training for noisy circles with 10k iterations and the accuracy values are closer to yours, but still not the same. Interestingly the shapes of the solutions are somewhat different as well.
Hi @paulinpaloalto
My results were for 10K iterations, and they are closer, though not identical, to your results at that iteration. Not sure what is causing the slight difference, unless we are working off different versions. I noticed that Coursera says: “This assignment was last updated on 25 May 2021, 4:30 AM PST (San Francisco Time).” I started working on it on May 26, 2021. Thank you! On to Week 4.
I double checked and the most recent updates to this assignment on May 25 were only changes to the public test cases, not to any of the logic in the notebook itself. So I don’t think the differences are explained by “versionitis”. One other possibility is that the statefulness of the notebook causes different results if you execute the cells in a different order. But, as you say, the results are pretty close, so it’s probably a better use of time just to go on to Week 4 rather than puzzling over this further at this point. Onward!