C1_W1_Assignment - check_for_leakage

Hello Course Support Staff,

In Exercise 1, I am finding a problem with the check_for_leakage function:

check_for_leakage_test(check_for_leakage) passes if I use

‘patient_id’ as the column name in df1_patients_unique = df1[‘patient_id’].unique() (and for df2) in the function check_for_leakage(),

while

the next cell : print(“leakage between train and valid: {}”.format(check_for_leakage(train_df, valid_df, ‘PatientId’)))

needs ‘PatientId’ as the column name in df1_patients_unique = df1[‘PatientId’].unique() (and for df2) in check_for_leakage() to not throw an error.

All other cells work fine after that and I have been able to complete the rest of the Jupyter steps and am able to see the heat map for the four images.

Exercises 2 and 3 are both satisfying expected outputs without any discrepancies.

Unfortunately, the autograder is not accepting either of my submissions due to this issue with Exercise 1. I have made three attempts.

Please advise. Thanks.

1 Like

Note that they give you the name of the “column” of the Pandas dataset as an argument to the function. Maybe you should try using that instead of “hard-coding” the name of the column. Generally speaking, it’s always a mistake to “hard-code” things like that unless you are literally forced to or they are specific and tell you to do that. Here why would they go to the trouble of passing that as an argument if they didn’t intend for you to use it? E.g. perhaps the test function uses a different dataset that has different column names?

And note that it is a python variable, so don’t “quote it” is if it is a string. The value of that variable is a string, right?

1 Like

Also note that “patient_id” and “PatientId” are not matching strings in python, right? Maybe that’s why they made the column name a variable. We’re not doing English literature here, we’re doing programming. So you can’t just mess around with things to express your personal style. Every character matters and even things that would be “style” in English like punctuation and indentation matter very literally here in python. Python is also a case-sensitive language and the difference between ; and : can ruin everything.

1 Like

Thank you, Paul in PaloAlto.

I have it working now, with your guidance.

Yes, I overlooked that basic item :slight_smile: I got an error by something else I did and looking at the long error output I saw the reference to patient_id and entered it hard-coded as a trial and the first exercise output matched the expected output. And then obviously the next one failed.

But I am still quite confused about one thing. When I list the columns, I see:

Index([‘Image’, ‘Atelectasis’, ‘Cardiomegaly’, ‘Consolidation’, ‘Edema’,
‘Effusion’, ‘Emphysema’, ‘Fibrosis’, ‘Hernia’, ‘Infiltration’, ‘Mass’,
‘Nodule’, ‘PatientId’, ‘Pleural_Thickening’, ‘Pneumonia’,
‘Pneumothorax’],
dtype=‘object’)

I don’t see ‘patient_id’. But hardcoding ‘patient_id’ still works for the first exercise’s first question! That is not clear to me. Would be nice to have an explanation. (I have not created a new column with ‘patient_id’ either that could be stored in memory). Thank you.

  • Ananth Krishnan
1 Like

Sorry, but I don’t have access to the course files for AI4M, so I can’t look. But my guess is that the Pandas file that is used in the notebook itself has different column names. Why don’t you try listing the column names in the dataframe where “patient_id” worked. Just from a purely scientific p.o.v. the results of your experiment would suggest that it uses “patient_id”.

Or look in the test file and look at the dataframe that it uses. Click “File → Open” to find the python file with the test cases. You can deduce the name by looking at the “import” cell earlier in the notebook.

As I theorized in my earlier reply, that most likely explains why they made that an argument to the function.

1 Like

Yup! Here it was in a public test file function:

def check_for_leakage_test(target):
df1 = pd.DataFrame({‘patient_id’: [0, 1, 2]})
df2 = pd.DataFrame({‘patient_id’: [2, 3, 4]})

Thanks for your help! See you around, Paul.

1 Like

Science! Gotta love it. :nerd_face:

1 Like