Post alignment QC of RNA-seq data
3
4
Entering edit mode
4.7 years ago
KVC_bioinfo ▴ 550

Hello,

I have aligned the RNA-seq data to the human genome and I used FASTQC for the pre-processing data. Can I use FASTQC for post alignment QC also? Or is there a better way of doing it?

qc fastqc • 6.5k views
8
Entering edit mode
4.7 years ago

RSeQC is used a lot, but I find QoRTs a bit more user-friendly, especially if you have numerous samples.

In addition, feeding the results of a) FastQC, b) STAR (or whatever aligner you've used), and c) featureCounts to MultiQC is already quite useful.

The typical things you want to look out for:

• at least 80% alignment rate
• not too many intronic/intergenic reads
• even gene body coverage
0
Entering edit mode

Could you suggest a publication where such parameters are described? Or the QC of RNA-seq (pre and post analysis) is described

3
Entering edit mode

I would expect that the papers of QoRTs and RSeQC may contain a discussion of that.

The classic resources for basic RNA-seq measures is the ENCODE recommendation although it's a bit dated by now. There may be more updated guidelines on their website, haven't checked in a while.

An interesting read regarding the importance of which annotation you choose is Zhao & Zhang (2015) BMC Genomics. Regarding gene body coverage, I believe Lahens et al. (2014) Genome Biology 15:R86 discussed that nicely.

If you really want to learn about all the details of RNA-seq pre-processing, you may want to have a look at the notes that I compiled for a class I teach. It's more than 80 pages and fairly detailed especially in regard to raw and aligned read handling. The github repo is this: https://github.com/friedue/course_RNA-seq2017

0
Entering edit mode

Hello,

Thank you very much. The introduction to RNA-seq on your GitHub page is really very helpful.

1
Entering edit mode

I should add that it's not completely straight-forward to ask for definite thresholds. Whether an experiment failed or has acceptable results will depend on the specific circumstances of your experiment and the biological question that you want to address. For example, if you specifically expect many unannotated transcripts to be present in your sample, then increased numbers of intergenic and/or intronic reads may not be worrisome, but expected.

0
Entering edit mode

Is it usual for samples enriched by rRNA depletion to have higher intergenic/intronic content for human genome? I was working with publicly available data and observed this. Though it seems logical because these samples will contain ncRNA but I am not totally sure.

1
Entering edit mode

You will also get more immature/unspliced transcripts, which will still contain introns. The intergenic content shouldn't be dramatically elevated IMO, but usually you'll see considerably more introns than what you'd get with Poly(A) enrichment.

0
Entering edit mode

Yes. Using Picard I calculated the percentage of bases aligned to intronic and intergenic regions and found <15% for poly-A samples and for rRNA depleted samples it was ~25%. So I believe this should not be worrisome. And another query in addition to this, can the presence of ncRNA in rRNA samples produce multiple peaks in FastQC GC content curve.

1
Entering edit mode
4.7 years ago

FASTQC is primarily for pre-alignment and it takes as input FASTQ or FASTA files. After you perform your alignment, you should have produced a SAM or BAM file, which are not used as input for FASTQC.

The most common quality control metric that is used post-alignment is to check how many of your reads have aligned to the reference genome. The command samtools mpileup produces this.

1
Entering edit mode

I read in the manual for FASTQC that it takes the SAM and BAM input.

0
Entering edit mode

Yes, because some platforms produce these as unaligned files.

0
Entering edit mode

I am using STAR aligner

0
Entering edit mode
4.7 years ago
Ron ★ 1.1k

Post alignment ,you can use rseqc, http://rseqc.sourceforge.net/

It has very good modules,to work with BAM.

Also ,if you use STAR for alignment,please look out the log.final.out file for QC metrics.

0
Entering edit mode

Yes. I have used STAR aligner. It gave all the statistics in the log.final file. What is the next QC step I should perform?

0
Entering edit mode
1. look at the STAR statistics
2. run QoRTs or RSeQC