Hi everyone,
I have download GTEX data and TCGA data (only tumor samples available) for a given cancer type using the "recount" R package. After having filtered the genes that are in common between the 2 datasets, I would like to do a differential expression analysis normal vs. tumor.
My question is the following: should I do a kind of "batch" correction to make the comparison meaningful (if so, which tool would you suggest ?) ? or as the data have been processed in a same way by recount I can directly make a comparison ?
how would you proceed ?
NB: I am aware of the https://github.com/mskcc/RNAseqDB database but no data are provided there for my cancer type of interest...
Hope you could help !
Thanks
Hey erica.fary; so, your cancer(s) of interest has(have) no matched normals within TCGA itself, and you therefore want to avail of the GTEx dataset? As far as I know, recount data is "uniformly processed"; however, if your condition (tumour vs normal) is, irrespective, confounded by project (TCGA vs GTEx), then there is no way that the recount authors could have mitigated batch effects. Can you elaborate on the exact data retrieved, possibly by sharing your R code.
Dear Kevin,
Thanks for your reply and sorry for the delay in replying.
I indeed think that one of the purpose of recount2 would be to enable such comparison, but I am quite a newbie and was not sure about that...
My code so far looks like:
Let me know if you have (or someone else has) any suggestions, I would be grateful :)
Hi, I would e-mail Leonardo to verify that this is okay. His e-mail is available via the link that you included in your comment, i.e., http://research.libd.org/recountWorkflow/articles/recount-workflow.html