columns_except_Systolic_BP = [col for col in X_train.columns if col not in [‘Systolic BP’]]
for col in columns_except_Systolic_BP:
sns.distplot(X_train.loc[:, col], norm_hist=True, kde=False, label=‘full data’)
sns.distplot(dropped_rows.loc[:, col], norm_hist=True, kde=False, label=‘without missing data’)
plt.legend()
plt.show()
I have the following doubts:
The labels say ‘full data’ and ‘without missing data’ . doesn’t both mean the same?
The label says ’ without missing data’ , however you are only plotting the dropped columns. So that means the orange distribution shows the distribution of the dropped columns i.e it plots the values of the rows for which the systolic BP is missing. Please correct me if i am wrong
In the interpretation part it is written “But when considering the age covariate, we see that much more data tends to be missing for patients over 65. The reason could be that blood pressure was measured less frequently for old people to avoid placing additional burden on them.” but as per my understanding, data is missing for patients <45yrs because if we see the orange distribution then frequency of those patients is higher and as for patients >65yrs the frequency is less (in the orange dist) so that should mean most patients with age >65yrs have their systolic BP. Please correct me if my interpretation is wrong.
Here full data means data that includes the missing values(THE BELOW STATEMENT IS FROM THE ASSIGNMENT YOU HAVE DOUBT)
We see that our train and validation sets have missing values, but luckily our test set has complete cases.
THEN NEXT STATEMENT MENTIONS
As a first pass, we will begin with a complete case analysis , dropping all of the rows with any missing data. Run the following cell to drop these rows from our train and validation sets.
SO THE ABOVE STATEMENT TELLS THERE IS FILTERED OR REFINED DATA WHERE ROWS WITH ANY MISSIN DATA HAS BEEN DROPPED, AND THAT IS HOW BOTH DIFFER.
INCORRECT UNDERSTANDING, THE ORANGE COLOuR DISTRIBUTION IS THE DATA WHERE ROWS WITH MISSING DATA HAS BEEN DROPPED. I am sharing the table which shows after dropping the rows how columns are seen