Detecting 0 anomalies for large feature set in 2nd lab C3W1

Geoffrey_Blum · July 11, 2023, 6:16am

Since posting code snippets is against the rules, I’ll try to describe the situation as best I can.

In the final section of the anomaly detection lab at the end of C3_W1, there is a larger data set with 11 features that your code gets applied to. My first time through the lab, I didn’t follow the hints for “select_threshold” and used a different methodology which created arrays of float 1s and 0s using broadcasting of multiplication and subtraction and np.ceil (rather than Boolean 1s and 0s using element-wise logic). The function passed the test cell immediately below it, but when I ran the final code cell in the lab (the one that applies select_threshold to the larger data set) it identified 0 anomalies, chose epsilon=0, and calculated F1=0.

The grader compiled the code and gave a passing score, but I couldn’t figure out why the last cell doesn’t work with this version of select_threshold. I also tried implementing slightly different versions of the algebraic method using integer arithmetic instead of floats, but that didn’t change the result. Modifying the approach by using np.abs instead of multiplying by -1 causes the final code cell to correctly produce the number of anomalies and epsilon, but it calculates F1 to be very small (~0.008). Of course, when I changed the code to use element-wise Boolean tests instead, that fixed the output of the final cell to be the expected result.

Any ideas on what’s going on?

rmwkwok · July 11, 2023, 8:23am

Hello @Geoffrey_Blum,

From your description, you were able to produce the expected result. If I were you, I would -

add some print lines to show the outcomes of all intermediate variables, and
make a plan in my mind on how I would change the code from the version that produces the expected result to the version that I had question.

Then I will go through the plan to step-by-step make changes to the code, and see starting from when and where one (or more) intermediate variables does not behave expectedly. In this way I can narrow down the problem to hopefully the one line that works unexpectedly, and continue my investigation from there.

Good luck!

Raymond

Geoffrey_Blum · July 11, 2023, 4:28pm

Thanks, @rmwkwok! Your suggestion gave me an idea, and in retrospect, it seems kind of obvious.

I’ve checked, and this is the problem: The validation labels (y_val) are passed as an array of unsigned integers, while p_val is an array of floats. By using calculations of the form “(y_val - 1) * -1” I was doing some weird uint conversions.

Notably, I was getting the correct values in the test cell because

(a) the correct epsilon happens to have 0 false positives (which is the only calculation that uses the uint arithmetic), meaning that the rest of the calculations work correctly, and

(b) the tests to confirm correctness pass y_val as an array of ints (rather than uints).

I would encourage an update to the lab which changes either the validation labels used in the lab or in the tests to match one another (i.e. either y_val is always int or always uint).

TMosh · July 11, 2023, 5:26pm

Thanks for your analysis and suggestion.

TMosh · July 13, 2023, 4:10am

Update: I have submitted a ticket for the course staff to consider.

Topic		Replies	Views
C3_W1_Anomaly_Detection select_threshold function-Wrong best_epsilon. Expected: 0.04 got: 0.29 Unsupervised Learning, Recommenders, Reinforcement week-1	5	17	December 11, 2024
Anomaly detection assiment Unsupervised Learning, Recommenders, Reinforcement week-1	3	294	November 17, 2023
Hello! I have a problem with C3_W1_Anomaly_Detection Unsupervised Learning, Recommenders, Reinforcement week-1	8	298	November 28, 2023
C3_w1_anomaly detection Unsupervised Learning, Recommenders, Reinforcement week-1	12	568	December 17, 2024
Week 1 anomaly detection -- pass with 100% but it still isn't right Unsupervised Learning, Recommenders, Reinforcement week-1	1	264	December 22, 2023

Detecting 0 anomalies for large feature set in 2nd lab C3W1

Related topics