Need help with calculations

I have my own neural network implementation in Rust, as example I used code from course 1 task.

I fixed all performance issues and now I got issue with calculations. So, first of all this is what I got as cost (cross entropy):

Cost after: 0 iteration: 0.8088958440701614
Cost after: 1 iteration: 0.8237759870051128
Cost after: 2 iteration: 0.840735343330898
Cost after: 3 iteration: 0.8602733776819088
Cost after: 4 iteration: 0.8830879543258277
Cost after: 5 iteration: 0.9101905243122623
Cost after: 6 iteration: 0.9431160410947709
Cost after: 7 iteration: 0.9843361269280926
Cost after: 8 iteration: 1.0381577297095745
Cost after: 9 iteration: 1.1129604205428372
Cost after: 10 iteration: 1.2279559478614726
Cost after: 11 iteration: 1.4408517713809594
Cost after: 12 iteration: 2.0457278644293826
Cost after: 13 iteration: 10.26770324918441
Cost after: 14 iteration: -0.0
Cost after: 17 iteration: -0.0
Cost after: 18 iteration: -0.0
Cost after: 19 iteration: -0.0
Cost after: 20 iteration: -0.0
Cost after: 21 iteration: -0.0
Cost after: 22 iteration: -0.0
Cost after: 23 iteration: -0.0
Cost after: 24 iteration: -0.0
Accuracy: 0.49999999999999917

As you could see, cost is growing.

So, I tried to use different learning_rate, different layers set, also chat GPT adviced me to add epsilon to my dAL calculations to avoid getting NaN or inf. So, if we get this code from example:

dAL = - (np.divide(Y, AL) - np.divide(1 - Y, 1 - AL))

then chatGPT adviced to add epsilon = 1e-8 to this second part:
np.divide((1 - Y) + epsilon, (1 - AL) + epsilon))

In other words, I did all I can do to make my cost reduce. But had no success.

Could somebody explain what could be wrong ? Not sure if I allowed to share the code, so let me know if I can share extra details.

P.S. more info:

  1. my network structure is [ReLU, ReLU, Sigmoid] (same as in example, only last layer has Sigmoid activation func)
  2. as dataset I got pictures with cats and dogs, when I built dataset, I made input = (28x28x3, ), output = (1, ). Output has 0.0 and 1.0, where 0 is cat, 1 is dog.
  3. Accuracy always same (no matter how much iterations)

My guess would be that the problem of the cost is a second order effect. The real problem is more likely in how you are computing gradients or how you are applying them. E.g,. maybe your “update parameters” step is adding the gradients instead of subtracting them. Also note that if you have both cat and dog images, are you sure that you can treat that as a binary classification? Are there any images that have neither a dog nor a cat? What is the label on an image like that? Or do you have images that contain both dogs and cats?