Differential expression analysis between normal/cancer on recount2 data
Entering edit mode
2.5 years ago
erica.fary ▴ 20

Hi everyone,

I have download GTEX data and TCGA data (only tumor samples available) for a given cancer type using the "recount" R package. After having filtered the genes that are in common between the 2 datasets, I would like to do a differential expression analysis normal vs. tumor.

My question is the following: should I do a kind of "batch" correction to make the comparison meaningful (if so, which tool would you suggest ?) ? or as the data have been processed in a same way by recount I can directly make a comparison ?

how would you proceed ?

NB: I am aware of the https://github.com/mskcc/RNAseqDB database but no data are provided there for my cancer type of interest...

Hope you could help !


recount GTEX TCGA RNA-seq batch • 1.3k views
Entering edit mode

Hey erica.fary; so, your cancer(s) of interest has(have) no matched normals within TCGA itself, and you therefore want to avail of the GTEx dataset? As far as I know, recount data is "uniformly processed"; however, if your condition (tumour vs normal) is, irrespective, confounded by project (TCGA vs GTEx), then there is no way that the recount authors could have mitigated batch effects. Can you elaborate on the exact data retrieved, possibly by sharing your R code.

Entering edit mode

Dear Kevin,

Thanks for your reply and sorry for the delay in replying.

I indeed think that one of the purpose of recount2 would be to enable such comparison, but I am quite a newbie and was not sure about that...

My code so far looks like:

 # I try to follow the procedure described here http://research.libd.org/recountWorkflow/articles/recount-workflow.html
ovary_gtex <- scale_counts(TCGAquery_recount2(project="gtex", tissue = "ovary")$gtex_ovary)
ovary_tcga <- scale_counts(TCGAquery_recount2(project="tcga", tissue = "ovary")$tcga_ovary)
# # I skip some steps of processing, but I have in the end a
# # gene x sample matrix with GTEX samples than TCGA samples
ovary_all_data <- cbind(assays(ovary_gtex)$counts, assays(ovary_tcga)$counts)
to_keep <- rowMeans(ov_data_raw) > 0.5
dge <- DGEList(counts = ov_data_raw[to_keep, ])
dge <- calcNormFactors(dge)
samples_groups <- c(rep("normal", nrow(gtex_annot_dt)), rep("tumor", nrow(tcga_annot_dt)))
my_group_design <- factor(samples_groups, levels = c("normal", "tumor"))
my_design <- model.matrix( ~ my_group_design)
v <- voom(dge, my_design, plot = TRUE)
fit <- lmFit(v, my_design)
efit <- eBayes(fit)
DE_topTable <- topTable(efit, coef=ncol(v$design), number=Inf, sort.by="p") 

Let me know if you have (or someone else has) any suggestions, I would be grateful :)

Entering edit mode

Hi, I would e-mail Leonardo to verify that this is okay. His e-mail is available via the link that you included in your comment, i.e., http://research.libd.org/recountWorkflow/articles/recount-workflow.html


Login before adding your answer.

Traffic: 2712 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6