Question: Interpretation of TCGA clinical data
gravatar for bxia
2.8 years ago by
bxia140 wrote:

When I parse the xml file from TCGA,

I saw something like "age_at_initial_diagnosis", 63, precision = day, then I check the TCGA website, it is 63 years old...

and there are several number in xml about last day follow up... which number is correct? some patients look like being followed for more than 1 times, but I did calculation...the number does not match...


rna-seq • 1.8k views
ADD COMMENTlink modified 4 months ago by igor7.6k • written 2.8 years ago by bxia140


Can I just ask how did you parse the XML files ?

ADD REPLYlink written 2.7 years ago by jan90

Python has some XML (and JSON) libraries that can be imported, which you might find helpful:

ADD REPLYlink written 4 months ago by Charles Warden6.5k
gravatar for kristoffer.vittingseerup
4 months ago by
European Union
kristoffer.vittingseerup1.7k wrote:

I would suggest to use the data provided by the TCGA CDR (Clinical Data Resource) which have been manually curated and concatenated. It is described in this paper and should solve many of those problems.

ADD COMMENTlink written 4 months ago by kristoffer.vittingseerup1.7k
gravatar for igor
4 months ago by
United States
igor7.6k wrote:

I personally find that Xena is the easiest way to download TCGA-related data. All the datasets are aggregated in a basic table format with consistent sample names.

For example, the Pan-Cancer data is here:

ADD COMMENTlink written 4 months ago by igor7.6k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1042 users visited in the last hour