Question: RNA-seq quality control
gravatar for Ron
9 weeks ago by
United States
Ron480 wrote:


While looking at the RNA-seq mapping statistics from STAR(same batch,same library prep used for sequencing),I often see different numbers of mapped reads ranging from 18 million to 40 million mapped reads.Both kinds of samples(20 million mapping reads vs 40 million mapping reads) can give median of log of fpkms to be around 1, which I think is considered a good quality sample for downstream processing.

However,despite of having lets say 40 million mapping reads,some samples end up having median of log of fpkms to be around 0.5.

I have few questions in this regard.

Can we compare this sample in the same cohort as the one that has median of log of fpkms to be around 1 (both having 40 million mapped reads)?

Can we compare a sample with 20 million mapped reads with median log of fpkms around 0.5 to a sample with 40 million mapped reads with median of log of fpkms around 1 , or vice-versa(it is also possible) ?

Is there any minimum number of mapped reads that should be taken as a threshold e.g 20 million mapped reads for passing QC?

If we can not compare them directly,should we do batch correction for comparing them even though they are from same batch?

Lastly,I am using( for getting a count of rRNA reads,do these reads form a part of mapped reads or they are a part of overall input reads?Since I am taking the mapped bam file,my guess is they are included on only mapped reads and if they are high in number-- this results in lower quality of FPKMS even though we have 40 million mapped reads or greater(I mean sufficient mapped reads)



rna-seq star qc alignment next-gen • 223 views
ADD COMMENTlink modified 9 weeks ago by Devon Ryan60k • written 9 weeks ago by Ron480
gravatar for Devon Ryan
9 weeks ago by
Devon Ryan60k
Freiburg, Germany
Devon Ryan60k wrote:
  1. FPKMs shouldn't be used for any actual statistics, so having a different median value isn't an issue.
  2. The absolute number of mapped reads isn't the important thing, rather the difference in numbers. In general, you tend to run into issues when there's more than a 10x difference in the number of alignments between libraries.
  3. You don't have a batch to correct for.
  4. Again, you shouldn't do statistics with FPKMs. Just take the counts from featureCounts (or STAR, which I think can directly output them these days).
ADD COMMENTlink written 9 weeks ago by Devon Ryan60k

Hi Devon,

Thanks! I have an example where samples with similar mapping statistics with similar quality of FPKMS but there is one sample in the batch with similar mapping statistics but different FPKMS quality clustering separately.I am not sure whether its a real difference or a batch effect.All samples are of same disease RNASeq.

On other note,I had another question whether rRNA reads are included in the BAM File as well or just are in total reads?

ADD REPLYlink written 9 weeks ago by Ron480

Whether rRNA alignments are included in the BAM for and/or FPKMs is dependent on how you made both. My guess is that your FPKMs are calculated using total reads, rather than mapped reads, and that you just have differences in rRNA amounts between the samples.

ADD REPLYlink written 9 weeks ago by Devon Ryan60k

FPKMS and rRNA reads are calculated from STAR aligned bam files.

ADD REPLYlink written 9 weeks ago by Ron480
  1. Aligned against what?
  2. FPKMs calculated by what?

The exact details here are important.

ADD REPLYlink written 9 weeks ago by Devon Ryan60k

Human samples aligned against human genome hg19(STAR default parameters),FPKMS calculated by cufflinks(default parameters)

ADD REPLYlink modified 9 weeks ago • written 9 weeks ago by Ron480

As long as that contained the GL000228.1 contig then it contains 45S rRNA alignments. So since you used cufflinks it's likely that you're just seeing a difference in rRNA depletion. If that sample is causing problems then either exclude it or make an rRNA presence covariate that can be added to your GLM.

ADD REPLYlink written 9 weeks ago by Devon Ryan60k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1410 users visited in the last hour