Question: Inconsistent survival times in TCGA clinicalMatrix file?
6 weeks ago
english.server wrote:

While trying to anaylze survival of glioblastoma patients, I came up with the following data from downloaded GBM_clinicalMatrix file:

sampleID           CDE_survival_time    CDE_vital_status    days_to_last_followup
TCGA.06.5859.01    138                  LIVING              139
TCGA.27.1831.01    504                  DECEASED            505

I wondered if in my survival analysis I should use139 instead of 138 and 504 instead of 505. I have no idea how it is possible that when a person is deceased the followup is one day after death and when a person is alive last followup doesn't "update" survival time? Am I wrong or has there been a mistake (or two) in downloaded data??

Thank you in advance

written 6 weeks ago by english.server210
6 weeks ago
Kevin Blighe
Kevin Blighe wrote:

From where did you obtain this data? I just looked at the data on the GDC Legacy Data Portal and there is no discrepancy:

bcr_patient_barcode vital_status    last_contact_days_to    death_days_to
bcr_patient_barcode vital_status    days_to_last_followup   days_to_death
CDE_ID:2673794      CDE_ID:5        CDE_ID:3008273          CDE_ID:3165475
TCGA-27-1831        Dead            505                     505
TCGA-06-5859        Alive           139                     [Not Applicable]

From what I understand from using the TCGA data since ~2014, they do not calculate an actual 'survival time' in the main data, which leads me to believe that you are using some third-party re-processed data that seems to be erroneous, but I could be wrong, of course.


written 6 weeks ago by Kevin Blighe

Firstly, I wish to thank you indeed very much for the effort you put into answeing this question. I downloaded the data from Xena browser

The problem was due to the wrong column:


I chose, instead of the correct one:


Tanks again

written 6 weeks ago by english.server210

Sure thing

written 6 weeks ago by Kevin Blighe
