Hello!
Does someone could help me maybe? I’m doing the “final assignment” of the second week and everything works except for the calculation of the prevalence…
I don’t really know what to change in my codes and/or how to avoid using global variables

UNQ_C3 (UNIQUE CELL IDENTIFIER, DO NOT EDIT)

def get_prevalence(y):
“”"
Compute prevalence.

Args:
y (np.array): ground truth, size (n_examples)
Returns:
prevalence (float): prevalence of positive cases
"""
prevalence = 0.0
#Data analized
y = valid_results[class_labels].values
#Tot number of observations
N = len(y)
#Positive cases
TP= (y != 0).sum()
#Prevalence
prevalence = TP/N
### END CODE HERE ###
return prevalence

~/work/W2A1/test_utils.py in multiple_test(test_cases, target)
119 print(‘\033[92m’, success," Tests passed")
120 print(‘\033[91m’, len(test_cases) - success, " Tests failed")
→ 121 raise AssertionError(“Not all tests were passed for {}. Check your equations and avoid using global variables inside the function.”.format(target.name))
122
123

AssertionError: Not all tests were passed for get_prevalence. Check your equations and avoid using global variables inside the function.

Actually, the automatic grader is expecting numpy.mean, so please use it instead of using an equally valid but different way of calculating the prevalence. =)

Hence I think it’s best for you to understand how np mean works so you can think on the equivalent of the function if you want to do so, otherwise you can learn on how to use np mean.

It looks like there is an issue with the calculation of prevalence in the get_prevalence function. The error message suggests that not all tests were passed and to check your equations and avoid using global variables inside the function.

The prevalence is calculated as the number of positive cases divided by the total number of examples. In the provided code, it looks like the variable y is being reassigned to the values of the valid_results dataframe, and it should be the input y passed to the function. Also the sum of true positive cases is being computed using (y != 0).sum() which is not correct, it should be the sum of the cases where y=1

Here is an example of how the get_prevalence function should be implemented:

def get_prevalence(y): #Tot number of observations
N = len(y) #Positive cases
TP= (y == 1).sum() #Prevalence
prevalence = TP/N
return prevalence

You should also avoid using global variables inside the function, which is the best practice to make your code more maintainable, readable and testable.

You might want to pass the valid_results dataframe as an argument to the function and extract the required class_labels from it and then pass that to the function

Also, it’s recommended to check the input data and the function calls to ensure that the correct data is being passed to the get_prevalence function.