Question: Shoud I use "assigned reads" or total reads (assigned + unassigned) to the RPKM value?
0
gravatar for gustavoborin01
4.5 years ago by
University of Campinas, Brazil
gustavoborin0130 wrote:

Dear all,

I'm recalculating the RPKM value of a RNASeq data on Rsubread through featureCounts function, and I'd like to know if should I use just the "assigned" reads or the total reads, including "unassigned ambiguity, multimapping..." (see below), in the RPKM formula. Looking for the answer in forums and in the Mortazavi et al. (2008), I've just find out that " N is the total number of mappable reads in the experiment". So, could anybody please help in this regards?

RPKM = N/(L*T) 

where: 

N: number of reads assigned to a gene

L: lenght of the gene (kb)

T: total mapped reads (Millions)

 

T_reesei_F24.1_GGCTAC_L008_R1_001.cleanreads.fastq.gz_tophat2.F24h.1_accepted_hits.bam  
Assigned 32270962
Unassigned_Ambiguity 6896
Unassigned_MultiMapping 116803
Unassigned_NoFeatures 10751746
Unassigned_Unmapped 0
Unassigned_MappingQuality 0
Unassigned_FragementLength 0
Unassigned_Chimera 0

 

Thanks in advance! 

rpkm rna-seq rsubread R • 2.7k views
ADD COMMENTlink modified 4.5 years ago • written 4.5 years ago by gustavoborin0130
1

Well, RPKM is calculated with respect to total number of mapped reads.

If you are working on uniquely mapped reads on genome then you should only consider Assigned reads. 

ADD REPLYlink written 4.5 years ago by Manvendra Singh2.0k
3
gravatar for Devon Ryan
4.5 years ago by
Devon Ryan89k
Freiburg, Germany
Devon Ryan89k wrote:

If you include things like Unassigned_Ambiguity in the numerator, then include it in the denominator. Likewise with Unassigned_MultiMapping. Unassigned_NoFeatures could be left as part of the denominator, though I wouldn't include it since that'll bias things by sample quality. Having said that, I wouldn't calculate RPKMs at all, since they shouldn't be used in my opinion, by perhaps you have a good reason.

ADD COMMENTlink written 4.5 years ago by Devon Ryan89k
2

The statOmique consortium tested different normalization methods, RPKM is the worst one: http://bib.oxfordjournals.org/content/14/6/671.long

ADD REPLYlink modified 3.5 years ago • written 4.5 years ago by Asaf5.4k
2

This really can't be emphasized enough. RPKMs really are a bad solution in search of a problem.

ADD REPLYlink written 4.5 years ago by Devon Ryan89k

I entirely agree Devon.

But the problem is that , if we want to compare gene expression level e.g. across the cell lines then other than RPKM, what should we trust on?

I think RPKM is bad solution for smaller transcripts (<500bps).

ADD REPLYlink written 4.5 years ago by Manvendra Singh2.0k

You'd be better off with counts. The really tricky comparison is between organisms, but that's largely an unsolved problem (last I looked, at least).

ADD REPLYlink modified 4.5 years ago • written 4.5 years ago by Devon Ryan89k

In order to compare between the organisms, would it be better that if we consider only those reads which are mapping uniquely to both of the genomes.

then count the reads in features divided by total number of mapped reads

then normalize them by their quantiles

would then data be ready for comparison?

ADD REPLYlink written 4.5 years ago by Manvendra Singh2.0k

The issue is more how things might be meaningfully normalized when the gene sets aren't even the same. But anyway that's off topic to this post.

ADD REPLYlink written 4.5 years ago by Devon Ryan89k

Yes, Certainly. I was just curious.

Thanks

ADD REPLYlink written 4.5 years ago by Manvendra Singh2.0k
0
gravatar for gustavoborin01
4.5 years ago by
University of Campinas, Brazil
gustavoborin0130 wrote:

Thank you all! I really appreciated your answers!

ADD COMMENTlink written 4.5 years ago by gustavoborin0130
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2010 users visited in the last hour