TCGAbiolinks TCGA-BRCA RNA-seq clinical data
2
3
Entering edit mode
3.1 years ago
Matina ▴ 210

Hi all,

I have downloaded the TCGA-BRCA RNA-seq data and the associated clinical information using the code below.

CancerProject <- "TCGA-BRCA"

query <- GDCquery(project = CancerProject,
                  data.category = "Transcriptome Profiling",
                  data.type = "Gene Expression Quantification", 
                  workflow.type = "HTSeq - Counts")

samplesDown <- getResults(query,cols=c("cases"))

dataSmTP <- TCGAquery_SampleTypes(barcode = samplesDown,
                                  typesample = "TP")

queryDown <- GDCquery(project = CancerProject, 
                      data.category = "Transcriptome Profiling",
                      data.type = "Gene Expression Quantification", 
                      workflow.type = "HTSeq - Counts", 
                      barcode = dataSmTP)

GDCdownload(query = queryDown,directory = "BRC_RESULTS/TCGA/htseq_data/")                    

dataPrep <- GDCprepare(query = queryDown, 
                       save = TRUE, 
                       directory =  "BRC_RESULTS/TCGA/htseq_data/",
                       save.filename = "htseq_counts.rda", summarizedExperiment = TRUE)

In the clinical data there are several columns such as days_to_death or days_to_last_follow_up and other columns such as subtype_OS.Time or subtype_OS.event.

What is the difference between the columns having subtype_ at the begging and the rest and which one should I use for survival analysis? At the moment I have used the subtype_ columns for my analysis and I am wondering if this correct.

Thanks a lot,

Matina

TCGA BRCA TCGAbiolinks RNA-Seq • 2.4k views
ADD COMMENT
2
Entering edit mode

Dear Matina,

what is your purpose with the RNA-Seq data ? DE analysis ? looking for example to inspect the expression of specific genes ? or looking for molecular subtype pattern and survival analysis ? i think you already got an answer from one of the creators of the R package in the github account, correct ?

https://github.com/BioinformaticsFMRP/TCGAbiolinks/issues/227

Best,

Efstathios

ADD REPLY
0
Entering edit mode

Hi Efstathios,

I have a set of genes that I am interested in and I want to see if they are associated with clinical outcomes and molecular subtype patterns. You are right, I got an answer in the GitHub account.

Thanks a lot for your answer! Matina

ADD REPLY
4
Entering edit mode
3.1 years ago
atakanekiz ▴ 270

Hi Matina,

I would go with the days_to_death and days_to_last_follow_up (for alive patients) for survival analyses. I think stuff that starts with subtype_ might be manually curated data. I'm not 100% sure but, subtype_OS.Time sounds like the time period that the tumor was classified as a certain subtype (primary-metastatic-stage i-ii-iii etc). I think days_to_death is a more straightforward data type.

Atakan

ADD COMMENT
0
Entering edit mode

Hi Atakan,

This is correct - I got an answer from one of the developers of TCGAbiolinks at the Github account saying that everything that starts with subtype_ is actually metadata from papers that analyzed the samples suggested to use days_to_death. In any case what is strange is that the subtype_ column for OS has clinical info for patients that in the days_to_last_follow_up column is shown as missing or they report completely different number of days.

Thanks again, Matina

ADD REPLY
3
Entering edit mode
3.1 years ago
igor 12k

You could also consider using the Pan-Cancer Atlas curated survival data from Xena:

ADD COMMENT
0
Entering edit mode

Thank you very much Igor! I will have a look at this!

ADD REPLY

Login before adding your answer.

Traffic: 1830 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6