Question: TCGAbiolinks TCGA-BRCA RNA-seq clinical data
3
gravatar for Matina
17 months ago by
Matina170
United Kingdom/University of Edinburgh
Matina170 wrote:

Hi all,

I have downloaded the TCGA-BRCA RNA-seq data and the associated clinical information using the code below.

CancerProject <- "TCGA-BRCA"

query <- GDCquery(project = CancerProject,
                  data.category = "Transcriptome Profiling",
                  data.type = "Gene Expression Quantification", 
                  workflow.type = "HTSeq - Counts")

samplesDown <- getResults(query,cols=c("cases"))

dataSmTP <- TCGAquery_SampleTypes(barcode = samplesDown,
                                  typesample = "TP")

queryDown <- GDCquery(project = CancerProject, 
                      data.category = "Transcriptome Profiling",
                      data.type = "Gene Expression Quantification", 
                      workflow.type = "HTSeq - Counts", 
                      barcode = dataSmTP)

GDCdownload(query = queryDown,directory = "BRC_RESULTS/TCGA/htseq_data/")                    

dataPrep <- GDCprepare(query = queryDown, 
                       save = TRUE, 
                       directory =  "BRC_RESULTS/TCGA/htseq_data/",
                       save.filename = "htseq_counts.rda", summarizedExperiment = TRUE)

In the clinical data there are several columns such as days_to_death or days_to_last_follow_up and other columns such as subtype_OS.Time or subtype_OS.event.

What is the difference between the columns having subtype_ at the begging and the rest and which one should I use for survival analysis? At the moment I have used the subtype_ columns for my analysis and I am wondering if this correct.

Thanks a lot,

Matina

rna-seq brca tcga tcgabiolinks • 1.4k views
ADD COMMENTlink modified 17 months ago by igor8.8k • written 17 months ago by Matina170
2

Dear Matina,

what is your purpose with the RNA-Seq data ? DE analysis ? looking for example to inspect the expression of specific genes ? or looking for molecular subtype pattern and survival analysis ? i think you already got an answer from one of the creators of the R package in the github account, correct ?

https://github.com/BioinformaticsFMRP/TCGAbiolinks/issues/227

Best,

Efstathios

ADD REPLYlink written 17 months ago by svlachavas600

Hi Efstathios,

I have a set of genes that I am interested in and I want to see if they are associated with clinical outcomes and molecular subtype patterns. You are right, I got an answer in the GitHub account.

Thanks a lot for your answer! Matina

ADD REPLYlink written 17 months ago by Matina170
4
gravatar for atakanekiz
17 months ago by
atakanekiz170
atakanekiz170 wrote:

Hi Matina,

I would go with the days_to_death and days_to_last_follow_up (for alive patients) for survival analyses. I think stuff that starts with subtype_ might be manually curated data. I'm not 100% sure but, subtype_OS.Time sounds like the time period that the tumor was classified as a certain subtype (primary-metastatic-stage i-ii-iii etc). I think days_to_death is a more straightforward data type.

Atakan

ADD COMMENTlink written 17 months ago by atakanekiz170

Hi Atakan,

This is correct - I got an answer from one of the developers of TCGAbiolinks at the Github account saying that everything that starts with subtype_ is actually metadata from papers that analyzed the samples suggested to use days_to_death. In any case what is strange is that the subtype_ column for OS has clinical info for patients that in the days_to_last_follow_up column is shown as missing or they report completely different number of days.

Thanks again, Matina

ADD REPLYlink written 17 months ago by Matina170
3
gravatar for igor
17 months ago by
igor8.8k
United States
igor8.8k wrote:

You could also consider using the Pan-Cancer Atlas curated survival data from Xena:

ADD COMMENTlink modified 17 months ago by zx87548.4k • written 17 months ago by igor8.8k

Thank you very much Igor! I will have a look at this!

ADD REPLYlink written 17 months ago by Matina170
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1107 users visited in the last hour