Which software do you use for RNA-seq data quality control?
3
12
Entering edit mode
7.5 years ago
Christian ★ 3.0k

I am specifically interested in (RNA-seq specific) quality metrics not delivered by FastQC, for example 5'-3' coverage bias of transcripts, percentage of reads mapping to exons vs. introns, ratio of known to novel splice junctions, rRNA contamination, strand specificity, GC bias, etc.

Reading this post on Biostars, I learned about RSeQC and RNA-SeQC, which look like a fit. However, they did not attract a whole lot of citations since 2012, so I was wondering what other software is popular for RNA-seq data quality control, which clearly must run in dozens of RNA-seq data analysis pipelines around the world.

RNA-Seq quality-control qc • 18k views
1
Entering edit mode

I like RSeQC. You could check out this presentation: http://www.slideshare.net/mikaelhuss/all-bio-rnaseqqc for some other suggestions

1
Entering edit mode

We used RNA Seq and some custom-written quality metrics for this paper:

http://www.nature.com/nmeth/journal/v10/n7/full/nmeth.2483.html

I would not trust citations as a metric at all because people do not cite software that is used in QA assessment, though they should. Most people only cite software that is used to produce a result in the paper.  I know that the Picard CollectRNASeqMetrics is run on most RNA dataset that comes off the Broad sequencers but I doubt the users cite it much.  They just use quality metrics to know if the data is good or not.

0
Entering edit mode

"We used RNA Seq and some custom-written quality metrics for this paper:"

Did you mean you used RNA-SeQC ?

0
Entering edit mode
5
Entering edit mode
7.5 years ago

Picard has a module 'CollectRnaSeqMetrics' that is relevant. Also, not specific to RNA-seq data, but other more generic options that can be useful include: FastQC, BAMstats, SAMstat, samtools flagstat, etc.

We also produce a variety of custom metrics from the junctions.bed that you get along with TopHat alignments. We have found that evaluating the degree to which your RNA-seq library represents the breadth of known exon-exon junctions across many loci can be an indicator of overall RNA-seq data quality.

For example, you might ask the question, for how many genes do we observe reads supporting expression of at least 75% of the known exon-exon junctions of transcripts annotated for that locus?

2
Entering edit mode

This paper discusses some additional metrics: Quality Control of RNA-Seq Experiments.

2
Entering edit mode
5.5 years ago

There is a problem with creating a single report per file, because this approach doesn't scale. Instead I would prefer that tools easily tabulate quality metrics for many samples and files, making it easy to get a quick glimpse of the results in batches. Also, the alignment statistics are often at least as important as quality metric as the read qualities.
Another requirement could be that most of the data is accessed remotely, and the QC tool would optimally work headless.

These tools were recommended recently here:

This is a perfect application for MultiQC by Phil Ewels.

I will try MultiQC, it can also summarize alignments statistics which is a very useful feature, has a nice web-page with introductory screen casts and documentation.

• The installation via pip was very smooth (using local install option)
• First test run using STAR logs went ok, little problem that no reports were found at first, because the default file names are hard-coded, but can be configured.
• Check: http://multiqc.info/docs/#configuring-multiqc in case your pipeline renames reports.

AfterQC is another great QC tool for fastq.

0
Entering edit mode

Wow, multiqc is amazing. Really worked at first try on a directory with hundreds of log-files.

0
Entering edit mode

Yes. Another thumbs up for multiqc

1
Entering edit mode
6.1 years ago

A novel version of open-source Qualimap tool provides additional aspects specific to RNA-seq data quality control analysis. Most importantly, now multi-sample data analysis is supported providing abilities to detect outliers. Here's a link to publication which includes detailed comparison of Qualimap2 to RSeQC and RNA-seq QC:

Qualimap 2: advanced multi-sample quality control for high-throughput sequencing data