I have recently run an RNA-Seq analysis on some mouse tumors using DESeq2. The analysis looks beautiful but problems arise when we try to compare the results to the TCGA mRNA expression data posted on http://www.cbioportal.org/. We would like to show that genes upregulated in our tumors are also up in the TCGA data, and genes downregulated are also down in TCGA. However this only seems to work for genes that are upregulated; oddly, it seems that no matter what list of genes I put into TCGA, it shows that they are more often upregulated than downregulated.
Do cancer transcriptomes really have more upregulated genes than downregulated? My DESeq2 analysis shows that a roughly equal number of genes are downregulated and upregulated in our mouse tumors. Why does TCGA data have so many more upregulated genes?
Another curious fact is that this same data was analyzed a few years ago by a collaborator using Cuffdiff. That analysis also showed a much higher rate of upregulated vs. downregulated genes. Is this a known issue that DESeq2 gives more downregulated genes? I have been searching and haven't seen any mention of this online.
I unfortunately don't have time to run a DESeq2 analysis on the raw TCGA data, since there is a huge time crunch.
Thank you for any insights as to why a DEQeq2 analysis would give more downregulated genes than TCGA data and a Cuffdiff analysis of the same data.