comparing RNA-seq count data from two different sources
7.7 years ago
poisonAlien ★ 3.1k

Hello,

So I have RNAseq read count data from two different sources.

First one is from TCGA level-3 data , which has 'raw_counts' coulmn for each gene.

Second one is from a GEO dataset, where the submitter has provided "scaled_counts" for each gene. I guess its calculated from estimateSizeFactors() from DESeq.

Now, how do I compare these count tables, one is normalized/scaled and the other one is raw ?

Do I just scale the unscaled tcga data and compare with the other one or do I have to combine both the tables and scale it before proceeding?

You'd be best off downloading the raw data from the GEO dataset and then processing exactly how the TCGA dataset was processed. Otherwise you're likely to just have a mess on your hands.

Hi Devon,

You are right. I tried both method, and it does not produce expected results. Guess I will have to download raw data. Thank you.