Hi all,
I was super confused as to the instructions to use np.dot to compute the cost. Everything else in this course has been shown in videos, reading or exercises, but for the life of me, I cannot find anything on the computation of J across all examples using np.dot (or the equivalent mathematical formula). The vectorization videos go through everything BUT computing the cost when showing how to avoid the for-loops. I managed to do the exercises anyway with helpful hints. But no one else seems to have the same problem (and I’m guessing there must be others who aren’t experts in matrix operations yet), so I assume I must have missed something! Despite my attempts at a thorough search through the videos and training exercise (although I didn’t rewatch and redo everything), I STILL haven’t found it! If anyone can give me a hint as to where this was explained, I will be most grateful!
I wonder if there’s a typo in the comment, where it says “compute cost using np.dot”. As far as I can see, that comment should say “compute cost using np.sum”. (But please take my suggestion with a pinch of salt, since I am similarly finding this fiendishly difficult!)
I struggled with this too, but it turns out there’s a nice explanation for this in the Week 3 coding assignment, where you have to calculate cost again (week 3, exercise 5). They give some really useful tips on how to use np.sum, np.multiply, and np.dot. Here’s a screenshot from it (which doesn’t include any of my own code):
PS - if any moderators read this, please consider putting that bit of explanation in the week 2 coding assignment (and please forgive me for taking a screenshot!).
Thank you for bringing this to my attention. The comment is # compute cost using np.dot. Don't use loops for the sum. and after reviewing it I have realised these were supposed to be two separate comments, not the latter depending on the former as it so appears. I’ll have it fixed soon.
I am also confused by this. My function computes correctly if I use p.multiply and then np,sum, but when I use np.dot I can not get my function to compute correctly.
Here is the problem with the np.dot I am having:
cost = −1/𝑚(∑𝑚𝑖=1(𝑦(𝑖)log(𝑎(𝑖))+(1−𝑦(𝑖))log(1−𝑎(𝑖))))
where,
A – activation function of size (1, number of examples)
Y – true “label” vector (containing 0 if non-cat, 1 if cat) of size (1, number of examples)
Similarly, A = 𝜎(𝑤𝑇𝑋+𝑏)
where,
w – weights, a numpy array of size (num_px * num_px * 3, 1)
b – bias, a scalar
X – data of size (num_px * num_px * 3, number of examples)
as W is of size ((num_px * num_px * 3, 1) and after transposeing, it will have the size of
(1, num_px * num_px * 3). And after a dot product of W.T with X whose size is
(num_px * num_px * 3, number of example), A will have the size (1, number of example)
So now, coming back to cost function a dot product of A with y does not make sense as their size is not compatible as they are both row matrices.
If you have two 1 x m vectors and want the dot product, you have to transpose one of them, just as you did in the w^T \cdot X case. But those were column vectors, so you have to think through how the situation changes when the vectors are row vectors.
I am not certain whether I should be be grateful or angry. On the upper side of things I had to think about the problem for a long time which made me dive deeper into the np.dot() vs np.sum() operations. I think I wont forget the implications for a certain amount of time.
Still doesn’t work for me. In the Python Basics last practice question, I get the wrong answer no matter what I do with the dot product. I get the correct answer with sum() and multiply.
What answers do you get with dot product? One important thing to note is that this thread is about the cost function in Week 2 where we are dealing with 2 dimensional vectors. In the Numpy exercise the vectors only have 1 dimension, so things should be simpler: no transposes required.
@Tomy: Your “update parameters” logic is incorrect. Have another look at the math formulas for that. Your code does not agree with what the formula says. Note that the first cost is correct because that is after 0 iterations, meaning that no updates have happened yet. It only goes wrong when you update.
Also just as a general matter, please note that we aren’t supposed to publicly share solution code, even if it is incorrect. If the mentors can’t figure out how to help without seeing your code, we’ll ask to share it in a private way by DM.
Hi there.
I hope you all are enjoying your long weekend.
I am facing a kind of similar problem and it is in optimize function. it is regarding the dimsionsions mismatch but I dont understand why it is appearing and from where. I could not debug the error in my code. Here is the error.
Print the type and shape of your b value before the call to propagate from optimize. I’ll bet it is a 2 x 2 numpy array, but that may not happen on the first iteration. That is wrong: it should be a scalar, right? So how did that happen? Take a careful look at how you implemented the “update parameters” logic.
Also note that the shape of your w value is also probably wrong. The result of w^T \cdot X should be 1 x m (where m is 3 in this test case). So why does it turn out to be 2 x 3?
Thanks for your response.
In each iteration b and activation are changing. so for i = 1, b=1.5, A is 1x3, for i =2, b is 2x1 and A is 2x3. in the third iteration b becomes 2x2, where it shows error.
I believe that these variable should not change shape in the loop.
But in this way, it doesn’t work … why?
Maybe something has to do with the parameters’ sequence has been assigned to the compute_cost function?
Many thks and pardon me