comparing RSEM and counts data
2
0
Entering edit mode
7.5 years ago
Angel ▴ 210

Hi,

I have a RSEM data for a tumor type and I need to compare it gene COUNTS data. Is there a methodology to compare RSEM vs counts?

Thanks

RNA-Seq Counts RSEM • 9.3k views
ADD COMMENT
1
Entering edit mode
7.5 years ago

Why do you need to compare the two results? Some differential expression tools will compare gene-counts, but this is expected to differ from normalized expression levels (for example, longer genes should have relatively larger counts).

Also, the annotations can affect the normalized expression. For example, having reads align to multiple transcripts could be an issue, depending upon how you processed your data. Here is a link to a blog post to illustrate this:

http://cdwscience.blogspot.com/2014/04/differential-expression-without.html

However, the secondary message from that blog post is that the popular methods (e.g. cufflinks, RSEM) for quantifying gene-level expression are pretty robust. So, the RSEM mRNA quantification should be fine (and if you wanted to compare it to something, you should compare it to other mRNA quantification methods, not raw counts). My personal preference is to just to work with the RPKM/FPKM/TPM normalized expression values, and not worry about the raw counts.

ADD COMMENT
0
Entering edit mode

Hi Charles,

Please look at my reply to Devon as well. I have pre-normalized dataset for a tumor type which is RSEM on one hand (dataset1) and I do have counts and corresponding RPKM data set for another set of samples (dataset2).

The problem at hand is to compare expression of a couple of genes from these two different datasets and different normalizations. I thought since RSEM doesn't take into account gene length, it will be more relevant to compare it to counts data, not RPKM. But I don't know the methodology.

ADD REPLY
0
Entering edit mode

To be clear, RSEM is an algorithm, not a unit. In fact, I'm pretty sure that the RSEM is providing RPKM as the normalized expression values (which are corrected for gene lengths) along with other metrics (such a raw counts).

Independent of the mRNA quantification method / metric, there will probably be batch effects between the two datasets (especially if there are differences in the sample preparation). If you have members of the same group (say, tumor versus normal) in both datasets, you can correct for batch effects with something like a 2-way ANOVA. Otherwise, the interpretation will be tricky, no matter what.

ADD REPLY
0
Entering edit mode
7.5 years ago

Just use featureCounts or htseq-count to get the per-gene counts, load things into R, sort so they're in the same order, and then compare.

ADD COMMENT
0
Entering edit mode

Hi Devon,

I am sorry I don't know what you mean. I do not have raw data to work with. I only have RSEM pre-normalized data for a tumor type and only gene counts data (and RPKM) from another data set which refers to normal brain. I am supposed to compare gene expression for few genes from these two different datasets.

My question was how can I compare RSEM vs. counts and now I am adding, how can I compare RSEM vs RPKM if not possible to correlate RSEM vs. counts directly.

I don't know how featurecounts supposed to help me.

ADD REPLY
0
Entering edit mode

Is this data you downloaded from TCGA? If so, you should have mentioned that you don't have the data, just a table of RSEM counts.

ADD REPLY

Login before adding your answer.

Traffic: 2346 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6