Hello Course Moderators,
This was an excellent course. I learned quite a bit that was completely new for me, and it was very well presented.
A couple of improvements for future versions in Exercise 3.
Section 7 of the above final assignment of the course needs to be updated. It does not follow the simple way to calculate the Harrell’s C-index as presented in the video lessons.
I have completed the assignment by simply following the very simple and intuitive guidance provided in the video lessons.
- The starter code presented for this section is unnecessarily complex, and one section of it (see below) is plainly incorrect because if the first condition is implemented with an XOR (due to the “at most” clause in the comments), the next condition inside it will never be true:
check if at most one is censored
if None:
pass
# check if neither are censored
if None:
pass
- Also, this section below that is found in the Jupyter notebook right above the Exercise does not match the simple presentation in the video. This needs to be updated as well:
To evaluate how good our model is performing, we will write our own version of the C-index. Similar to the week 1 case, C-index in the survival context is the probability that, given a randomly selected pair of individuals, the one who died sooner has a higher risk score.
However, we need to take into account censoring. Imagine a pair of patients, 𝐴 and 𝐵.
Scenario 1
• A was censored at time 𝑡𝐴
• B died at 𝑡𝐵
• 𝑡𝐴<𝑡𝐵.
Because of censoring, we can’t say whether 𝐴 or 𝐵 should have a higher risk score.
Scenario 2
Now imagine that 𝑡𝐴>𝑡𝐵.
• A was censored at time 𝑡𝐴
• B died at 𝑡𝐵
• 𝑡𝐴>𝑡𝐵
Now we can definitively say that 𝐵 should have a higher risk score than 𝐴, since we know for a fact that 𝐴 lived longer.
Therefore, when we compute our C-index
• We should only consider pairs where at most one person is censored
• If they are censored, then their censored time should occur after the other person’s time of death.
The metric we get if we use this rule is called Harrel’s C-index.
Note that in this case, being censored at time 𝑡 means that the true death time was some time AFTER time 𝑡 and not at 𝑡.
• Therefore if 𝑡𝐴=𝑡𝐵 and A was censored:
○ Then 𝐴 actually lived longer than 𝐵.
○ This will effect how you deal with ties in the exercise below!
Thank you,
Ananth Krishnan