Hi,
I’m stuck at course 3 week 1 assignment exercise 11 # UNQ_C11 def treatment_dataset_split
for the part below.
# From the training set, get the labels of patients who received treatment
y_treat_train = y_train[X_treat_train.index]
Frankly I’m not 100% sure exactly what the question is asking for.
- Is it just to subset the input
y_train
numpy array to output arrayy_treat_train
WITHOUT adding another column for labels? More specifically, isy_treat_train
expected to have 1 column? - If
y_treat_train
is expected to have additional column for label, then what should the labels be? In the text cell below, it uses a column calledID
. But in later cells, the actual dataX_dev
doesn’t have suchID
column. So it doesn’t seem correct to hardcode in the function to have the labels ofy_treat_train
to beID
column. This is VERY confusing to me. - I tried using
index
as shown in the snippet above but it failed, because the inputy_train
is an array, without index, unlike the inputX_train
. Normally, just like the rest of the assignment, input X and y should have the same index, and both shall be dataframe. - Based on description of this function
def treatment_dataset_split
, it’s expecting outputy_treat_train
to benp.array
. However, in later cells, you can see output oftreatment_dataset_split
is used as input forholdout_grid_search
which is expectingy_treat_train
to be a 1 column dataframe. So this is another confusing inconsistency to me.
Hopefully I’m making some sense this. This truly is quite frustrating. I’d appreciate any clarifications here.
Thanks
Huan