Shoud I use "assigned reads" or total reads (assigned + unassigned) to the RPKM value?
2
0
Entering edit mode
7.0 years ago

Dear all,

I'm recalculating the RPKM value of a RNASeq data on Rsubread through featureCounts function, and I'd like to know if should I use just the "assigned" reads or the total reads, including "unassigned ambiguity, multimapping..." (see below), in the RPKM formula. Looking for the answer in forums and in the Mortazaviet al.(2008), I've just find out that "N is the total number ofmappable reads in the experiment". So, could anybody please help in this regards?

RPKM = N/(L*T)

where:

N: number of reads assigned to a gene
L: length of the gene (kb)
T: total mapped reads (Millions)

                           T_reesei_F24.1_GGCTAC_L008_R1_001.cleanreads.fastq.gz_tophat2.F24h.1_accepted_hits.bam      
Assigned                   32270962
Unassigned_Ambiguity       6896
Unassigned_MultiMapping    116803
Unassigned_NoFeatures      10751746
Unassigned_Unmapped        0
Unassigned_MappingQuality  0
Unassigned_FragementLength 0
Unassigned_Chimera         0

Thanks in advance!

rpkm RNA-Seq R Rsubread • 3.9k views
ADD COMMENT
1
Entering edit mode

Well, RPKM is calculated with respect to total number of mapped reads.

If you are working on uniquely mapped reads on genome then you should only consider Assigned reads.

ADD REPLY
0
Entering edit mode

Thank you all! I really appreciated your answers!

ADD REPLY
3
Entering edit mode
7.0 years ago

If you include things like Unassigned_Ambiguity in the numerator, then include it in the denominator. Likewise with Unassigned_MultiMapping. Unassigned_NoFeatures could be left as part of the denominator, though I wouldn't include it since that'll bias things by sample quality. Having said that, I wouldn't calculate RPKMs at all, since they shouldn't be used in my opinion, by perhaps you have a good reason.

ADD COMMENT
2
Entering edit mode

The statOmique consortium tested different normalization methods, RPKM is the worst one: http://bib.oxfordjournals.org/content/14/6/671.long

ADD REPLY
2
Entering edit mode

This really can't be emphasized enough. RPKMs really are a bad solution in search of a problem.

ADD REPLY
0
Entering edit mode

I entirely agree Devon.

But the problem is that , if we want to compare gene expression level e.g. across the cell lines then other than RPKM, what should we trust on?

I think RPKM is bad solution for smaller transcripts (<500bps).

ADD REPLY
0
Entering edit mode

You'd be better off with counts. The really tricky comparison is between organisms, but that's largely an unsolved problem (last I looked, at least).

ADD REPLY
0
Entering edit mode

In order to compare between the organisms, would it be better that if we consider only those reads which are mapping uniquely to both of the genomes.

then count the reads in features divided by total number of mapped reads

then normalize them by their quantiles

would then data be ready for comparison?

ADD REPLY
0
Entering edit mode

The issue is more how things might be meaningfully normalized when the gene sets aren't even the same. But anyway that's off topic to this post.

ADD REPLY
0
Entering edit mode

Yes, Certainly. I was just curious.

Thanks

ADD REPLY

Login before adding your answer.

Traffic: 1503 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6