Wrong grade. Please review

In the A/B Testing programming assignment,

(1) Exercise 4: reject_nh_t_statistic
– My answer in the notebook was “False” and matched the Expected Output “False” in the notebook. The grader output captured my “False” answer, but the Expected changed to “True”.

(2) Exercise 7: z_statistic_diff_proportions
–My answer in the notebook was “-3.8551”, which matched exactly the “Expected Output” in the notebook. The grader output incorrectly captured my answer as " -3.2391764645307353" and also incorrectly changed the expected to “-2.581988897471611”.

Please review and correct the grades, as I believe I got both exercises correct. Thank you.

Passing a unit test does not prove a piece of code is correct or that it should pass a different test. Suppose your task is to write a function that will add two numbers passed to it as parameters. If the function always returns 6 it will pass many unit tests, say with parameters 1 and 5, or 4 and 2 etc. But it won’t pass a unit test with parameters 3 and 2, which now has a different expected output, too. Suggest look closely to see if you might have hard coded something that should have used a parameter, or perhaps inadvertently incorporated a global variable instead of using one with local scope. The grader uses different test parameters and runs each graded function in isolation from your notebook, so variables that were in scope when you ran locally may no longer be. Let us know if you find anything interesting.

Hi, ai_curious!

Not sure I follow your analogy, so not sure if it applies in this case. Anyway, I did not hard-code any answer. What I’m saying is that, for Exercise 4, the Expected Output changed from False in the notebook to True in the auto-grader. Is that normal? For Exercise 7, auto-grader says my answer is wrong, but that’s not even my answer. And the auto-grader is showing the expected output that is different from the expected output in the notebook. Are these normal?

Thank you.

There is no guarantee that the autograder uses the same parameters as any tests in the notebook. In fact, you should always assume it doesn’t do. So if the function is being called with different parameters input, wouldn’t you expect different results output? It would be alarming if the autograder input different parameters but still output the same result you produced in the notebook.

The most common cause of graded functions that matched expected output in the notebook but not in the autograder is hard coding. The second is use of global variables. In both cases, not using the parameters passed in or variables otherwise in local scope to perform computation. Especially global variables can be hard to debug. Give it a good look.

Let me explain the problem in a different way.
The notebook says “A” is the correct answer, and the auto-grader says “B” is the correct answer. I don’t have anything to do with either answer.

In another instance, my answer is “C”, and the auto-grader says my answer is “D”.

It is possible that there is an error in the grader; someone from deeplearning.ai will have to look into that. All I’m suggesting is be cautious in how much you read into passing a unit test…it doesn’t prove anything. This is true in these classes as well as the rest of your programming life. This code seems to exhibit the characteristics you describe: passes one unit test, fails the autograder, autograder reports different expected outcome than the original unit test. Bug is in my implementation, not the grader.

def add(a, b):
    return a + a

#unit test 1
notebook_expected_result = 4
assert (add(2,2) == notebook_expected_result)

print("This code is perfect!")

#autograder test 1
grader_expected_result = 5
assert(add(2,3) == grader_expected_result)
This code is perfect!

AssertionError                            Traceback (most recent call last)
Input In [8], in <cell line: 12>()
     10 #autograder test 1
     11 grader_expected_result = 5
---> 12 assert(add(2,3) == grader_expected_result)


ps: I notice that you have reported trouble with the grader twice before. I encourage you to think about the pattern. What was it about the previous implementations that the grader objected to? How did you fix it? How might that apply here? Hope this helps

Yes, that was going to be my suggestion as well. Recall the case in which you had a similar problem with the DLS C1 W3 Planar Data Assignment. In that case it turned out that the form of hard-coding was very subtle, but was caused the fact that you changed parts of the provided template code that did not need to be changed. The result of that change was that one of the parameters to the function was ignored and the grader used a different value for that parameter than the hard-coded value you used. But that value happened to be the one that matched the test cases in the notebook.

1 Like

I firmly believe these cases are different…

In one exercise, the notebook says the expected output is “A” and my answer is “A”, and the auto-grader says the expected output is “B”.
In another exercise, the notebook says the expected output is “A” and my answer is “A”, and the auto-grader says my answer is “B” and the expected output is “C”.

Could my code change the expected output? Or could my make the auto-grader generate an answer for me that is different from what I’d submitted? If the answer to both is “yes”, then I’d like to know under what conditions these could happen.

Thank you.

But ai_curious explained what that means, right? It simply means that the grader uses a different test case. So the point we are trying to make is that you must be missing something about your code that implies that your solution is not general.

But I have not taken this course, so don’t have the ability to help directly. The course is pretty new, so maybe you’re actually right and you’re the first one to discover some landmine in the facilities here. The real solution here is that we need the involvement of a real mentor for M4ML or one of the course staff. I will contact them and ask them to get someone to help here.

Ok, thank you. No worries.
Believe it or not, I hate bothering other people, and I’m sure you have better use of your time. If the grading is based on the answer, the code, etc., we can all be helped with a bit of explanation of, or even just a pointer to, what is wrong.

No worries.

The grader uses different tests than the ones in the notebook.

Your code must work correctly in any situation.

1 Like

Hi @Arnold_F_Babila, can you send me a copy of your notebook via DM so I can take a look?

Andres- No worries. Thanks for reaching out.

But the point is that the grader does not have the ability to analyze your source code. It simply runs tests and checks the results that are generated by your code. If the answers are wrong, that means there is some problem with your code. So what else is the grader supposed to say? The next step is that you need to have the debugging skills to find the bug. If you can’t, then you can appeal for help here from the mentors or the course staff, which is what you have done. Please send a DM to Andres with your ipynb file. Just click “File → Download as notebook (ipynb file)” and then click Andres’s name and then the “Message” icon. Then use the little “Up Arrow” tool on the Message editor to attach the ipynb file.

A pointer to the actual failing test would be useful. That is generally available in the real world. As it is, the autograder fails often, for completely inscrutable reasons, and there isn’t really a practical path to debugging beyond

a. making sure your syntax is correct.
b. making sure your output matches the “Expected output”

There is another thread in this forum where a person’s notebook was failing all the tests, but they were sure their code was correct. The fix was to “reset” their notebook, and then all the tests passed. Which to me says lack of debugging skills is not generally the problem.

A thing to check (stupid human error). Make sure you don’t leave random “# grade-up-to-here” comments in your notebook, because then you will get 0s for subsequent cells. And tear your hair out trying to figure out why. :grinning:

But lack of understanding the state of the notebook can also be a problem as in that case. Note that in some cases the grader does not do an automatic “Save” when you click “Submit”. Typing new code in a function and then calling it again also does nothing: it just reruns the old code. You have to click “Shift-Enter” on the changed function in order for the new code to be incorporated into the image. The symptoms in that case you describe suggest that the problem was something to do with notebook state. There’s not much the grader can do about that.

Don’t get me wrong: I’m not arguing that the graders are not frustrating in a lot of cases. But apparently some of this is limitations of the Coursera platform. So using the forums to get help is the best we can offer.

1 Like

can you check how you are calculating your p-value for Exercise-4 and Exercise-7 ? That’s where I had a bug in my code and after spending some time, I was able to fix it and all the grader tests passed.
Your case might be different but just saying incase it might help you.