Question: TCGAbiolinks TCGA-BRCA RNA-seq clinical data
1
gravatar for Matina
3 months ago by
Matina150
United Kingdom/University of Edinburgh
Matina150 wrote:

Hi all,

I have downloaded the TCGA-BRCA RNA-seq data and the associated clinical information using the code below.

CancerProject <- "TCGA-BRCA"

query <- GDCquery(project = CancerProject,
                  data.category = "Transcriptome Profiling",
                  data.type = "Gene Expression Quantification", 
                  workflow.type = "HTSeq - Counts")

samplesDown <- getResults(query,cols=c("cases"))

dataSmTP <- TCGAquery_SampleTypes(barcode = samplesDown,
                                  typesample = "TP")

queryDown <- GDCquery(project = CancerProject, 
                      data.category = "Transcriptome Profiling",
                      data.type = "Gene Expression Quantification", 
                      workflow.type = "HTSeq - Counts", 
                      barcode = dataSmTP)

GDCdownload(query = queryDown,directory = "BRC_RESULTS/TCGA/htseq_data/")                    

dataPrep <- GDCprepare(query = queryDown, 
                       save = TRUE, 
                       directory =  "BRC_RESULTS/TCGA/htseq_data/",
                       save.filename = "htseq_counts.rda", summarizedExperiment = TRUE)

In the clinical data there are several columns such as days_to_death or days_to_last_follow_up and other columns such as subtype_OS.Time or subtype_OS.event.

What is the difference between the columns having subtype_ at the begging and the rest and which one should I use for survival analysis? At the moment I have used the subtype_ columns for my analysis and I am wondering if this correct.

Thanks a lot,

Matina

rna-seq brca tcga tcgabiolinks • 351 views
ADD COMMENTlink modified 3 months ago by igor6.6k • written 3 months ago by Matina150
2

Dear Matina,

what is your purpose with the RNA-Seq data ? DE analysis ? looking for example to inspect the expression of specific genes ? or looking for molecular subtype pattern and survival analysis ? i think you already got an answer from one of the creators of the R package in the github account, correct ?

https://github.com/BioinformaticsFMRP/TCGAbiolinks/issues/227

Best,

Efstathios

ADD REPLYlink written 3 months ago by svlachavas480

Hi Efstathios,

I have a set of genes that I am interested in and I want to see if they are associated with clinical outcomes and molecular subtype patterns. You are right, I got an answer in the GitHub account.

Thanks a lot for your answer! Matina

ADD REPLYlink written 3 months ago by Matina150
3
gravatar for atakanekiz
3 months ago by
atakanekiz70
atakanekiz70 wrote:

Hi Matina,

I would go with the days_to_death and days_to_last_follow_up (for alive patients) for survival analyses. I think stuff that starts with subtype_ might be manually curated data. I'm not 100% sure but, subtype_OS.Time sounds like the time period that the tumor was classified as a certain subtype (primary-metastatic-stage i-ii-iii etc). I think days_to_death is a more straightforward data type.

Atakan

ADD COMMENTlink written 3 months ago by atakanekiz70

Hi Atakan,

This is correct - I got an answer from one of the developers of TCGAbiolinks at the Github account saying that everything that starts with subtype_ is actually metadata from papers that analyzed the samples suggested to use days_to_death. In any case what is strange is that the subtype_ column for OS has clinical info for patients that in the days_to_last_follow_up column is shown as missing or they report completely different number of days.

Thanks again, Matina

ADD REPLYlink written 3 months ago by Matina150
3
gravatar for igor
3 months ago by
igor6.6k
United States
igor6.6k wrote:

You could also consider using the Pan-Cancer Atlas curated survival data from Xena:

ADD COMMENTlink modified 3 months ago by zx87545.0k • written 3 months ago by igor6.6k

Thank you very much Igor! I will have a look at this!

ADD REPLYlink written 3 months ago by Matina150
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 909 users visited in the last hour