I was trying to implement the dropout model in the format of Andrew-NG deep_learning course-1 week-4. Data is used of deep-learning course-2 week-1 regularization assignment
- “dropout_project.ipynb” is the main project file where when I ran it after 1500 iterations, the cost become nan
- “deep_nn.py” and “dropout_and_regularization.py” are the helper function file.
- I had tested my implementation for all the bugs
And I have also one doubt, does the “d” variable change every iteration or fixed constant for every iteration. In my implementation I have kept the value of d1 and d2 to be fixed by recalling np.random.seed(1) at the start of the iteration.
Please someone help me
deep_nn.py (10.2 KB)
dropout_and_regularization.py (6.9 KB)
dropout_project.ipynb (21.4 KB)
aL min-max analysis after every 100 iterations of different models. The max value of the aL in the dropout model becomes 1 after 700 iteration
- After 1500 iteration the min value of aL = 0, max value of aL = 1 which results in cost error and daL zero divide error
Please click my name and message your notebook as an attachment.
Thank you for the notebook.
Please move your original post to general discussions topic since your concern is not directly related to course assignments.
Someone with the bandwidth to help out on your personal project(s) will contact you and look at your code.
I’ll leave you with one tip. You can get away with building a keras sequential model and still use dropout / dense and other layers. Look at this link to construct your model. Don’t worry about backpropagation. It’s implicitly taken care of by tensorflow.
I have not looked at your code, but one point to make is that perfectly correct code can get NaN for the cost with either sigmoid or softmax output if the activation value “saturates” to exactly 0 or 1. You can add some logic to your cost calculations to check for that case and avoid getting NaN. Here’s a thread which discusses that.
On the point about setting the random seed in every iteration, that’s the way they have us do it in the assignments just for ease of grading, but I think it’s a mistake to do that in a “real” system: that is not the intent of dropout. The whole idea is that you want the behavior to be stochastic. If you just wanted a smaller network, you could have used a smaller network. Here’s a thread which discusses that point.
Thanks for the help.
Fixing aL solves the problem.