Looking for a tool which provides mapping quality score distributions from BAM files
3
4
Entering edit mode
19 months ago
Thomas ▴ 160

Hello BioStars,

Is there a tool which generates mapping quality score distributions from bam files?

I know I could potentially do this myself, but I am looking for something which would essentially do the work for me, and also something which plugs in nicely to multiqc

I have had a quick search, and it is surprisingly difficult to find a tool which provides this functionality, so I am just wondering if I am missing something obvious here? The obvious candidate would be something like picard, but picard does not seem to offer this functionality from what I can tell

picard samtools multiqc • 2.7k views
1
Entering edit mode

There is no standard tool available possibly because of this issue.

0
Entering edit mode

Thank you. What I take from this is that the mapping quality is not necessarily meaningless but rather its implementation so far has been non-standard, and that the user ought to take care to really understand how their aligner has implemented the mapping quality score in the BAM/SAM output.

The link mentions BamQC (https://github.com/s-andrews/BamQC) which has the desired functionality, however, it is not yet integrated into MultiQC, although there are plans for this (https://github.com/ewels/MultiQC/issues/417)

6
Entering edit mode
18 months ago
$samtools view input.bam | cut -f 5 | sort | uniq -c | sort -n | awk '{printf("MAPQ:%s\t%d\n",$2,\$1);}' | gnuplot -e " set terminal dumb ;set nokey; plot '-' using 2:xtic(1) with boxes"

400 ++---+---------+---------+---------+--------+---------+----***********
|    +         +         +         +        +         +    *    +    *
350 ++                                                         *        +*
|                                                          *         *
|                                                          *         *
300 ++                                                         *        +*
|                                                          *         *
250 ++                                                         *        +*
|                                                          *         *
200 ++                                                         *        +*
|                                                          *         *
|                                                          *         *
150 ++                                                         *        +*
|                                                          *         *
100 ++                                                         *        +*
|                                                          *         *
|                                                          *         *
50 ++                                               ***********        +*
|    +         +         +         +   ***********    +    *    +    *
0 **********************************************************************
MAPQ:7    MAPQ:25   MAPQ:0    MAPQ:70  MAPQ:29   MAPQ:37   MAPQ:60

0
Entering edit mode

Thank you very much for this, this is greatly appreciated, and I apologise for the slow response to your helpful message

0
Entering edit mode

mark the answer(s) to be accepted by clicking on the green mark on the left.

0
Entering edit mode

Thanks, I already have. There is probably some lag/delay for you

3
Entering edit mode
19 months ago
Phil Ewels ★ 1.1k

I'm pretty sure that Qualimap BamQC (not to be confused with the BamQC tool from Babraham mentioned above) calculates and plots this. Qualimap is supported by MultiQC, however off the top of my head, I don't think that MultiQC replicates this plot in reports. I doubt it would be particularly difficult to add if you fancy it (or opening a GitHub issue with some example data), or you could probably sidestep that by using MultiQC's "custom content" feature.

0
Entering edit mode

Thank you very much, I will look into this as soon as I find some time

3
Entering edit mode
19 months ago
Dave Carlson ★ 1.2k

This may be too simple for what you need, but I have a script here that uses pysam to extract MapQ scores from a bam file and then generates a kernel density plot of the scores.

Here is the actual code:

#!/usr/bin/env python

import pysam
import seaborn as sns
import argparse
from matplotlib import pyplot as plt

parser = argparse.ArgumentParser(description="Plot mapping quality score from bam file")

parser.add_argument('--bam', required=True, help='path to input bam file', action='store')
parser.add_argument('--png', required=True, help='path to output png file', action='store')

args=parser.parse_args()

output = args.png
input = args.bam

bam = pysam.AlignmentFile(input, "rb")

sns.set_style("dark")
sns.set_context("paper")

plt.figure(figsize=(10,6))

ax = sns.kdeplot(quals, shade = True)
ax.set_ylabel('Density')
ax.set_xlabel('Mapping Quality Score')

plt.savefig(output)

0
Entering edit mode

Thank you very much. It is appreciated. I hope to give this a try soon.