Question: Survival Analysis Using Tcga Data
3
gravatar for jack
5.0 years ago by
jack420
jack420 wrote:

I'm using TCGA gene expression data. At some part of my work I need to do survival analysis . I wonder to know that, is there any way to get some information from TCGA to do survival analysis of the sample which I have gene expression of them?

tcga bioinformatician • 26k views
ADD COMMENTlink modified 21 months ago by xushutan40 • written 5.0 years ago by jack420
10
gravatar for dirigible2012
5.0 years ago by
dirigible2012310
European Union
dirigible2012310 wrote:

I'm currently in the middle of something similar - the TCGA Bioinformatics team very kindly helped me out.

If you want to get the raw data yourself, it is in the "Clinical" data. These can be downloaded as text or XML - I've mostly looked at the XML files. I believe there is normally one file for the patient, and one file for every sample taken. (Normally there's just one sample, obtained at time of surgery.)

The problem is that dates in the clinical data, such as date of death, have been redacted to preserve patient privacy. I think that all dates have been replaced with values giving the number of days since original diagnosis.

If you just want to do a survival curve, you are looking for the number under the XML tag "days_to_death".

The day the particular sample was taken is under "days_to_sample_procurement" (i.e. number of days between diagnosis and sample procurement). I think you could find other useful numbers by just doing a find for "days_to".

Hope this helps,

Stephanie

ADD COMMENTlink written 5.0 years ago by dirigible2012310

Thank, but which value I should take it out. I've looked at XML file of it and I found the line with tage "days_to_death" it's like this : shared:days_to_death precision="day" xsd_ver="1.12" tier="1" cde="3165475" owner="TSS" procurement_status="Not Applicable"/> <shared:days_to_last_followup precision="day" xsd_ver="1.12" tier="1" cde="3008273" owner="TSS" <="" p="">

ADD REPLYlink written 5.0 years ago by jack420

That's interesting. I presume the XML file works like an HTML file, so you want the value in between the two tags. (I've replaced the angle brackets with square because Biostar is interpreting them as HTML.)

e.g. (tags shortened a bit)

[shared: days_to_death] VALUE [/shared: days_to_death]

I've had a look at an example file, and it looks to me like if there is a missing value the file contains the start tag but not the end tag. In this case, you are missing the days_to_death, which suggests the patient is still alive.

If you look at the example below, the days_to_death value is also missing, but the vital status is "Alive" and there is a value for days to last followup.

[shared:vital_status xsd_ver="2.6" restricted="false" procurement_status="Completed" owner="TSS" cde="5" display_order="25" preferred_name="vital_status" tier="2" source_system_identifier="492461"] Alive[/shared:vital_status]

[shared:days_to_last_followup xsd_ver="1.12" procurement_status="Completed" owner="TSS" cde="3008273" tier="1" precision="day"] 389[/shared:days_to_last_followup]

[shared:days_to_last_known_alive xsd_ver="2.1" procurement_status="Not Available" owner="TSS" cde="" tier="2" precision="day"/]

[shared:days_to_death xsd_ver="1.12" procurement_status="Not Applicable" owner="TSS" cde="3165475" tier="1" precision="day"/]

ADD REPLYlink modified 5.0 years ago • written 5.0 years ago by dirigible2012310

Thanks, but what is "xsd_ver="1.12" ?

ADD REPLYlink written 5.0 years ago by jack420

hey dirigible2012 & Stephanie, is there a file that explains about the xml tags for the clinical data ? I am also doing the survival analysis and I am looking at the xml files, they seem to be really large and convoluted. its taking time to understand them, I was wondering if there is some guide for the xml tag description, then I can parse out the necessary information.. I might need other clinical data as well in future.

 

thanks so much

ADD REPLYlink written 4.8 years ago by srividyanathan20060

We (at SolveBio) have actually gone through the individual clinical patient information files for each TCGA cancer type and parsed out some of this information. See https://www.solvebio.com/library/TCGA/1.2.0-2015-02-11/PatientInformation for more information about the data and this ipython notebook for an example of how to access the data (SolveBio is free for academics/noncommercial-use, so sign up and try it out). It was kind of a mess but I think we've done a decent job. ICGC is a quite a bit easier to work with and includes a lot of TCGA. 

ADD REPLYlink written 3.9 years ago by dandan350
6
gravatar for Miao Yu
4.2 years ago by
Miao Yu70
China
Miao Yu70 wrote:

It's easy to fetch those data with R.

TCGA-Assembler is a very good tool for you to get those data easily.

On the assumption that you are familar with R.

First, download this tools, and unpackage it.

Second,source("Model_A.R")

Third, excute the next sentence.

DownloadClinicalData(traverseResultFile = "./DirectoryTraverseResult_Jul-08-2014.rda", saveFolderName ="./UserManualExampleData/RawData.TCGA-Assembler", cancerType = "BLCA", clinicalDataType = c("patient", "drug", "follow_up", "radiation"))

saveFolderName ="./UserManualExampleData/RawData.TCGA-Assembler" #set the dir

cancerType = "BLCA" #choose the cancer type

clinicalDataType = c("patient", "drug", "follow_up", "radiation")) #choose the type of the linical data you want to download

if you just want get the data for suvival analysis, you can just choose "follow_up", as choose the "days_to_death" and "days_to_last_follow_up" columns in the file as the death and censored data for survival analysis.

Or you just can get the clinical data for this weblink,

https://tcga-data.nci.nih.gov/tcgafiles/ftp_auth/distro_ftpusers/anonymous/tumor/ucec/bcr/nationwidechildrens.org/bio/clin/

good luck~

ADD COMMENTlink modified 4.2 years ago • written 4.2 years ago by Miao Yu70
6
gravatar for Zhenyu Zhang
4.0 years ago by
Zhenyu Zhang240
United States
Zhenyu Zhang240 wrote:

I have strong opinion against using TCGA data for survival analysis, please correct me if I am wrong. 

If you check days_to_death, or days_to_last_contact, you would found days as early as 2000 days ago, way before TCGA even started.  My suspicion is that these were patient from other programs, and they were diagnosed before TCGA project.  If I am correct on this, there is a huge bias here that only live person were later recruited to TCGA, while the dead ones from these legacy programs were hidden and never show up in TCGA.  I guess the majority people who used TCGA data for analysis never thought about this.  

So these dates need to be adjust to the TCGA dates, by subtracting either days_to_collection or days_to_procuration of the samples.  The new problem here is the second is almost all empty, while the first dates is about 80% empty.  This means, by starting with a 500 patient project, you get about 400 with either available days_to_death or days_to_last_contact, and ran down to less than 100 with days_to_collection.  This number is not enough of any kind of survival comparisons by say biomarker, clinical categories, or etc.  

ADD COMMENTlink written 4.0 years ago by Zhenyu Zhang240
1
gravatar for nick
5.0 years ago by
nick90
nick90 wrote:

Try Synapse platform (need to register but you can access with a google account).

https://www.synapse.org/#!Synapse:syn300013

For example, here you can find survival data for Lung Squamous Cell Carcinoma.

https://www.synapse.org/#!Synapse:syn1446127/version/3

ADD COMMENTlink written 5.0 years ago by nick90
0
gravatar for TriS
4.3 years ago by
TriS3.6k
United States, Buffalo
TriS3.6k wrote:

even if a lil late...you can analyze survival by using the example here

http://bioinformatics.mdanderson.org/Supplements/ResidualDisease/Reports/osCurves.html

that's the main part about overall survival (in ovarian caner) but it also has links on how to build the dataset and build your own analysis for your preferred tumor type

ADD COMMENTlink written 4.3 years ago by TriS3.6k
0
gravatar for EagleEye
4.2 years ago by
EagleEye6.2k
Sweden
EagleEye6.2k wrote:

This should be the easiest way, you can also select the datasets from PROGgene or you can upload your own datasets. FYI: It also has datasets from TCGA.

 

http://watson.compbio.iupui.edu/chirayu/proggene/database/?url=proggene

Reference:

http://www.biomedcentral.com/1471-2407/14/970/abstract

You can also check previous posts explaining how to download Clinical data from TCGA.

A: Clinical Survival data of TCGA

 

ADD COMMENTlink modified 4.2 years ago • written 4.2 years ago by EagleEye6.2k
0
gravatar for JP
2.1 years ago by
JP0
JP0 wrote:

accidentally posted in wrong comment section, sorry!

ADD COMMENTlink modified 2.1 years ago • written 2.1 years ago by JP0
0
gravatar for xushutan
21 months ago by
xushutan40
xushutan40 wrote:

A website for Breast cancer survival curve in different subtypes: luminal A, luminal B, Basal, Her2 and Normal-like. http://tumorsurvival.org/

ADD COMMENTlink written 21 months ago by xushutan40
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1399 users visited in the last hour