Question: Which software do you use for RNA-seq data quality control?
gravatar for Christian
5.1 years ago by
Cambridge, US
Christian2.8k wrote:

I am specifically interested in (RNA-seq specific) quality metrics not delivered by FastQC, for example 5'-3' coverage bias of transcripts, percentage of reads mapping to exons vs. introns, ratio of known to novel splice junctions, rRNA contamination, strand specificity, GC bias, etc. 

Reading this post on Biostars, I learned about RSeQC and RNA-SeQC, which look like a fit. However, they did not attract a whole lot of citations since 2012, so I was wondering what other software is popular for RNA-seq data quality control, which clearly must run in dozens of RNA-seq data analysis pipelines around the world.

rna-seq quality-control qc • 14k views
ADD COMMENTlink modified 3.1 years ago by Michael Dondrup46k • written 5.1 years ago by Christian2.8k

I like RSeQC. You could check out this presentation: for some other suggestions

ADD REPLYlink written 5.1 years ago by Mikael Huss4.6k

We used RNA Seq and some custom-written quality metrics for this paper:

I would not trust citations as a metric at all because people do not cite software that is used in QA assessment, though they should. Most people only cite software that is used to produce a result in the paper.  I know that the Picard CollectRNASeqMetrics is run on most RNA dataset that comes off the Broad sequencers but I doubt the users cite it much.  They just use quality metrics to know if the data is good or not.  

ADD REPLYlink written 5.0 years ago by Michele Busby2.0k

"We used RNA Seq and some custom-written quality metrics for this paper:"

Did you mean you used RNA-SeQC ?

ADD REPLYlink modified 13 months ago • written 13 months ago by olechnwin0
gravatar for Malachi Griffith
5.1 years ago by
Washington University School of Medicine, St. Louis, USA
Malachi Griffith17k wrote:

Picard has a module 'CollectRnaSeqMetrics' that is relevant.  Also, not specific to RNA-seq data, but other more generic options that can be useful include: FastQC, BAMstats, SAMstat, samtools flagstat, etc.

We also produce a variety of custom metrics from the 'junctions.bed' that you get along with TopHat alignments.  We have found that evaluating the degree to which your RNA-seq library represents the breadth of known exon-exon junctions across many loci can be an indicator of overall RNA-seq data quality. 

For example, you might ask the question, for how many genes do we observe reads supporting expression of at least 75% of the known exon-exon junctions of transcripts annotated for that locus?

ADD COMMENTlink modified 5.1 years ago • written 5.1 years ago by Malachi Griffith17k

This paper discusses some additional metrics: Quality Control of RNA-Seq Experiments.

ADD REPLYlink written 4.5 years ago by Malachi Griffith17k
gravatar for Michael Dondrup
3.1 years ago by
Bergen, Norway
Michael Dondrup46k wrote:

There is a problem with creating a single report per file, because this approach doesn't scale. Instead I would prefer that tools easily tabulate quality metrics for many samples and files, making it easy to get a quick glimpse of the results in batches. Also, the alignment statistics are often at least as important as quality metric as the read qualities.
Another requirement could be that most of the data is accessed remotely, and the QC tool would optimally work headless.

These tools were recommended recently here:

This is a perfect application for MultiQC by Phil Ewels.

I will try MultiQC, it can also summarize alignments statistics which is a very useful feature, has a nice web-page with introductory screen casts and documentation.

  • The installation via pip was very smooth (using local install option)
  • First test run using STAR logs went ok, little problem that no reports were found at first, because the default file names are hard-coded, but can be configured.
  • Check: in case your pipeline renames reports.

AfterQC is another great QC tool for fastq.

ADD COMMENTlink modified 3.1 years ago • written 3.1 years ago by Michael Dondrup46k

Wow, multiqc is amazing. Really worked at first try on a directory with hundreds of log-files.

ADD REPLYlink written 20 months ago by ATpoint19k

Yes. Another thumbs up for multiqc

ADD REPLYlink written 13 months ago by olechnwin0
gravatar for madk00k
3.8 years ago by
madk00k340 wrote:

A novel version of open-source Qualimap tool provides additional aspects specific to RNA-seq data quality control analysis. Most importantly, now multi-sample data analysis is supported providing abilities to detect outliers. Here's a link to publication which includes detailed comparison of Qualimap2 to RSeQC and RNA-seq QC :

Qualimap 2: advanced multi-sample quality control for high-throughput sequencing data




ADD COMMENTlink modified 3.8 years ago • written 3.8 years ago by madk00k340
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 969 users visited in the last hour