Divide by zero log error encountered when images dimension becomes larger

I wrote a very simple image recognition module for classifying whether an image is a given person or not, base from the simple logistic regression model from the assignment in week 2, course 1 of DLS.

In the assignment, the images that were used are in the shape of `64 x 64 x 3`. In my case however, I was initially using `200 x 200 x 3` images. I was having very slow computation which is understandable given the dimension of the image, so I scaled it down to `100 x 100 x 3`.

But even so, I’m still encountering this `divide by zero` error in my `cost` equation, from the numpy `log()` function. My cost equation is written as:

``````cost = (-1/m) * (np.dot(Y, np.log(A.T)) + np.dot((1-Y), np.log(1-A.T)))
``````

I also tried `64 x 64 x 3` images and I was getting pretty good model performance, almost the same as the performance of the model in week 2’s programming assignment.

I am still learning the basic of NN and does not quite understand yet why such a problem arise as the image dimension becomes larger. I’m thinking maybe `regularization` will come into play?

Hello @maikeruji. I am assuming that you safely passed through the original assignment before your extra-curricular experimentation. Regularization will come into play for any model which is richly parameterized in the face of limited data.

That said, your immediate problem seems to be of a straight-up numerical variety, and you are suspecting the cost function. Here’s a shot from mid-court: try applying `np.log()` to the vector/matrix before using the transpose operation. Example: `np.log(1 - A).T`.

Hello @kenb , thank you so much.

I actually made a mistake in my learning rate. I totally missed out the fact that since the dimension I am using is larger, that I should have also tried to decrease the learning rate.

I was only playing around `0.005` and some larger rates i.e. `0.002`, even `0.01`. Thus, I was getting `divide by zero` error, and even `NaN` on the `cost` variable.

So far, I only tested on `68 training samples` and `15 test test samples` of `200 x 200 x 3` images. I tried decreasing my learning rate to `0.0001` and the model is kinda working fine now - ran on both the original cost equation I was using, and the one you suggested with `np.log(1 - A).T`.

Let me share some sample results:

The model is not quite perfect as per the result in 3rd image - giving a wrong prediction - which is totally understandable. And here is a summary of costs for the 1st image using `2000` iterations over a `0.0001` learning rate:

``````Cost after iteration 0: 0.693147
Cost after iteration 100: 0.496348
Cost after iteration 200: 0.398713
Cost after iteration 300: 0.334916
Cost after iteration 400: 0.289068
Cost after iteration 500: 0.254152
Cost after iteration 600: 0.226520
Cost after iteration 700: 0.204052
Cost after iteration 800: 0.185414
Cost after iteration 900: 0.169709
Cost after iteration 1000: 0.156305
Cost after iteration 1100: 0.144742
Cost after iteration 1200: 0.134677
Cost after iteration 1300: 0.125843
Cost after iteration 1400: 0.118035
Cost after iteration 1500: 0.111089
Cost after iteration 1600: 0.104875
Cost after iteration 1700: 0.099286
Cost after iteration 1800: 0.094235
Cost after iteration 1900: 0.089650
``````

If you don’t mind, I want to ask some brief insights on why `np.log(1 - A.T)` should be replaced with `np.log(1 - A).T` in this case? I tested on both and they yield the exact same result in performance.

The numpy log function operates “elementwise”, so those two should give the exact same result. It is just a question of when the transpose happens in order. The reason for the NaNs is if you “saturate” the sigmoid values to exactly 1 or 0, then you end up with the log of 0. Of course the output of sigmoid is never exactly 0 or 1 mathematically, but we are dealing with the limits of finite floating point representations.

Oh I see now.

Thanks a lot @paulinpaloalto and @kenb !

Of course, @paulinpaloalto is correct. Mathematically, the truth is plain. As I said, with little to go on, I took a shot from mid court! On very rare occasions, I have discovered “source level” bugs.