Question: Interpretation of TCGA clinical data
gravatar for bxia
3.4 years ago by
bxia140 wrote:

When I parse the xml file from TCGA,

I saw something like "age_at_initial_diagnosis", 63, precision = day, then I check the TCGA website, it is 63 years old...

and there are several number in xml about last day follow up... which number is correct? some patients look like being followed for more than 1 times, but I did calculation...the number does not match...


rna-seq • 2.3k views
ADD COMMENTlink modified 12 months ago by igor8.9k • written 3.4 years ago by bxia140


Can I just ask how did you parse the XML files ?

ADD REPLYlink written 3.4 years ago by jan120

Python has some XML (and JSON) libraries that can be imported, which you might find helpful:

ADD REPLYlink written 12 months ago by Charles Warden7.5k
gravatar for kristoffer.vittingseerup
12 months ago by
European Union
kristoffer.vittingseerup2.9k wrote:

I would suggest to use the data provided by the TCGA CDR (Clinical Data Resource) which have been manually curated and concatenated. It is described in this paper and should solve many of those problems.

ADD COMMENTlink written 12 months ago by kristoffer.vittingseerup2.9k
gravatar for igor
12 months ago by
United States
igor8.9k wrote:

I personally find that Xena is the easiest way to download TCGA-related data. All the datasets are aggregated in a basic table format with consistent sample names.

For example, the Pan-Cancer data is here:

ADD COMMENTlink written 12 months ago by igor8.9k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1228 users visited in the last hour