Shoud I use "assigned reads" or total reads (assigned + unassigned) to the RPKM value?
2
0
Entering edit mode
7.0 years ago

Dear all,

I'm recalculating the RPKM value of a RNASeq data on Rsubread through featureCounts function, and I'd like to know if should I use just the "assigned" reads or the total reads, including "unassigned ambiguity, multimapping..." (see below), in the RPKM formula. Looking for the answer in forums and in the Mortazaviet al.(2008), I've just find out that "N is the total number ofmappable reads in the experiment". So, could anybody please help in this regards?

RPKM = N/(L*T)


where:

N: number of reads assigned to a gene
L: length of the gene (kb)

                           T_reesei_F24.1_GGCTAC_L008_R1_001.cleanreads.fastq.gz_tophat2.F24h.1_accepted_hits.bam
Assigned                   32270962
Unassigned_Ambiguity       6896
Unassigned_MultiMapping    116803
Unassigned_NoFeatures      10751746
Unassigned_Unmapped        0
Unassigned_MappingQuality  0
Unassigned_FragementLength 0
Unassigned_Chimera         0


rpkm RNA-Seq R Rsubread • 3.9k views
1
Entering edit mode

Well, RPKM is calculated with respect to total number of mapped reads.

If you are working on uniquely mapped reads on genome then you should only consider Assigned reads.

0
Entering edit mode

3
Entering edit mode
7.0 years ago

If you include things like Unassigned_Ambiguity in the numerator, then include it in the denominator. Likewise with Unassigned_MultiMapping. Unassigned_NoFeatures could be left as part of the denominator, though I wouldn't include it since that'll bias things by sample quality. Having said that, I wouldn't calculate RPKMs at all, since they shouldn't be used in my opinion, by perhaps you have a good reason.

2
Entering edit mode

The statOmique consortium tested different normalization methods, RPKM is the worst one: http://bib.oxfordjournals.org/content/14/6/671.long

2
Entering edit mode

This really can't be emphasized enough. RPKMs really are a bad solution in search of a problem.

0
Entering edit mode

I entirely agree Devon.

But the problem is that , if we want to compare gene expression level e.g. across the cell lines then other than RPKM, what should we trust on?

I think RPKM is bad solution for smaller transcripts (<500bps).

0
Entering edit mode

You'd be better off with counts. The really tricky comparison is between organisms, but that's largely an unsolved problem (last I looked, at least).

0
Entering edit mode

In order to compare between the organisms, would it be better that if we consider only those reads which are mapping uniquely to both of the genomes.

then count the reads in features divided by total number of mapped reads

then normalize them by their quantiles

would then data be ready for comparison?

0
Entering edit mode

The issue is more how things might be meaningfully normalized when the gene sets aren't even the same. But anyway that's off topic to this post.

0
Entering edit mode

Yes, Certainly. I was just curious.

Thanks

Traffic: 1503 users visited in the last hour
FAQ
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.