How to plot alignment statistics; how many reads are mapped to the genome?
2.1 years ago
anamaria ▴ 180

Hello,

I am doing RNA-seq analysis. I will have these steps performed:

hisat2 -p 12 --new-summary --summary-file $OUTPUT.hisat2.summary -x$REF -1 $R1 -2$R2 -S $OUTPUT.sam #1-Convert sam to bam samtools view -bS -o$OUTPUT.bam $OUTPUT.sam # 2- Sort bam file samtools sort$OUTPUT.bam  -o $OUTPUT.sorted.bam # 3- Generate index for bam file samtools index$OUTPUT.sorted.bam


I know that I get number of mapped and unmapped reads with:

samtools view  -b -f 2 $OUTPUT.bam > mapped.bam samtools view -b -F 2$OUTPUT.bam > unmapped.bam


Can someone please recommend me a code to make a plot like attached?

2.1 years ago

multiqc will automatically generate reports for hisat, and a bunch of other software such as fastqc.

Thank you so much! So basically if I use hista2 with --new-summary flag I will get summary stats that I can use with MultiQC to generate plots? Do you have any tutorial on how MultiQC is exactly used for that purpose?

Or all I need to run is: multiqc .

and it will generate teh output from whatever it find in the current directory? Please advise

It will look through all the files and directories contained within the directory you specify for compatible results/reports.

So for example if you have a project directory that has a directory with your fastqc results and another directory with your hisat2 results, if you specify that project directory it will generate a report that includes the fastqc results and hisat2 results.