I wrote a very simple image recognition module for classifying whether an image is a given person or not, base from the simple logistic regression model from the assignment in week 2, course 1 of DLS.

In the assignment, the images that were used are in the shape of 64 x 64 x 3. In my case however, I was initially using 200 x 200 x 3 images. I was having very slow computation which is understandable given the dimension of the image, so I scaled it down to 100 x 100 x 3.

But even so, I’m still encountering this divide by zero error in my cost equation, from the numpy log() function. My cost equation is written as:

I also tried 64 x 64 x 3 images and I was getting pretty good model performance, almost the same as the performance of the model in week 2’s programming assignment.

I am still learning the basic of NN and does not quite understand yet why such a problem arise as the image dimension becomes larger. I’m thinking maybe regularization will come into play?

Hello @maikeruji. I am assuming that you safely passed through the original assignment before your extra-curricular experimentation. Regularization will come into play for any model which is richly parameterized in the face of limited data.

That said, your immediate problem seems to be of a straight-up numerical variety, and you are suspecting the cost function. Here’s a shot from mid-court: try applying np.log() to the vector/matrix before using the transpose operation. Example: np.log(1 - A).T.

I actually made a mistake in my learning rate. I totally missed out the fact that since the dimension I am using is larger, that I should have also tried to decrease the learning rate.

I was only playing around 0.005 and some larger rates i.e. 0.002, even 0.01. Thus, I was getting divide by zero error, and even NaN on the cost variable.

So far, I only tested on 68 training samples and 15 test test samples of 200 x 200 x 3 images. I tried decreasing my learning rate to 0.0001 and the model is kinda working fine now - ran on both the original cost equation I was using, and the one you suggested with np.log(1 - A).T.

The model is not quite perfect as per the result in 3rd image - giving a wrong prediction - which is totally understandable. And here is a summary of costs for the 1st image using 2000 iterations over a 0.0001 learning rate:

Cost after iteration 0: 0.693147
Cost after iteration 100: 0.496348
Cost after iteration 200: 0.398713
Cost after iteration 300: 0.334916
Cost after iteration 400: 0.289068
Cost after iteration 500: 0.254152
Cost after iteration 600: 0.226520
Cost after iteration 700: 0.204052
Cost after iteration 800: 0.185414
Cost after iteration 900: 0.169709
Cost after iteration 1000: 0.156305
Cost after iteration 1100: 0.144742
Cost after iteration 1200: 0.134677
Cost after iteration 1300: 0.125843
Cost after iteration 1400: 0.118035
Cost after iteration 1500: 0.111089
Cost after iteration 1600: 0.104875
Cost after iteration 1700: 0.099286
Cost after iteration 1800: 0.094235
Cost after iteration 1900: 0.089650

If you don’t mind, I want to ask some brief insights on why np.log(1 - A.T) should be replaced with np.log(1 - A).T in this case? I tested on both and they yield the exact same result in performance.

The numpy log function operates “elementwise”, so those two should give the exact same result. It is just a question of when the transpose happens in order. The reason for the NaNs is if you “saturate” the sigmoid values to exactly 1 or 0, then you end up with the log of 0. Of course the output of sigmoid is never exactly 0 or 1 mathematically, but we are dealing with the limits of finite floating point representations.

Of course, @paulinpaloalto is correct. Mathematically, the truth is plain. As I said, with little to go on, I took a shot from mid court! On very rare occasions, I have discovered “source level” bugs.