Week 1, Assignment: UNQ_C1

Hi, I get this error when trying to do the check_for_leakage function:

AttributeError: ‘DataFrame’ object has no attribute ‘patient_col’

I’m confused because it’s one of the inputs and I use it when extracting the values from the df e.g. df1.patient_col.values

Could someone please assist?

It means that the Dataframe under consideration has no colunn named as such, maybe there is a grammatical mistake!

Thank you. Is this an isolated issue with just my work or are other students experiencing the same issue? I can’t seem to understand why it works for Test Case 1, but fails on test case 2.

1 Like

I dont have the statistics, i dont think its frequent!

This is a bug in your code. You must be using the wrong syntax to “dereference” the dataframe. Note that patient_col is a variable containing the string name of the “key” for the Dataframe (which acts like a python dictionary). So it will fail if you say:

myDf.patient_col

Because you would be treating patient_col as the name of an attribute or method of that class. Try this instead:

myDf[patient_col]

Of course I’m just making up the variable name for a Dataframe. That’s not the variable they use in that function. I’m not supposed to write the code for you: just trying to explain the concept.

4 Likes

Thank you so much! This makes sense

I faced the same AttributeError as this post creator. I have pass the assignment UNQ_C1 by using your solution.
However, I am still not clear about: when to use “myDf.patient_col” or “myDf[patient_col]” ?
Why in this assignment we have to treat the Dataframe as python dictionary?

A Pandas DataFrame is not a python dictionary. My point was just that the syntax of “indexing” a DataFrame is very similar: it is a “keyed” datastructure with string names as the keys. If you are not used to object oriented programming, perhaps you should spend some time studying that aspect of python. This reference:

myDf.patient_col

only makes sense if myDf is an instance of a python class that has an attribute or method declared with the name patient_col. In the particular function here, patient_col is a python variable whose value is a python string, so the above reference makes no sense.

The high level point here is that once you graduate to OOP, then python is just the framework: you really need to understand in detail the definition of the class(es) you are dealing with. In this case it is the Pandas DataFrame. Since Pandas is heavily used throughout the AI4M specialization, any effort you spend learning about that will not be wasted. Google is your friend.

1 Like

Thank you for pointing a way for me. I will find and read more about OOP and Pandas DataFrame. :+1: