Looking for a tool which provides mapping quality score distributions from BAM files
3
4
Entering edit mode
2.7 years ago
Thomas ▴ 160

Hello BioStars,

Is there a tool which generates mapping quality score distributions from bam files?

I know I could potentially do this myself, but I am looking for something which would essentially do the work for me, and also something which plugs in nicely to multiqc

I have had a quick search, and it is surprisingly difficult to find a tool which provides this functionality, so I am just wondering if I am missing something obvious here? The obvious candidate would be something like picard, but picard does not seem to offer this functionality from what I can tell

Thanks in advance!

picard samtools multiqc • 4.6k views
ADD COMMENT
1
Entering edit mode

There is no standard tool available possibly because of this issue.

ADD REPLY
0
Entering edit mode

Thank you. What I take from this is that the mapping quality is not necessarily meaningless but rather its implementation so far has been non-standard, and that the user ought to take care to really understand how their aligner has implemented the mapping quality score in the BAM/SAM output.

The link mentions BamQC (https://github.com/s-andrews/BamQC) which has the desired functionality, however, it is not yet integrated into MultiQC, although there are plans for this (https://github.com/ewels/MultiQC/issues/417)

ADD REPLY
6
Entering edit mode
2.6 years ago
$ samtools view input.bam | cut -f 5 | sort | uniq  -c | sort -n | awk '{printf("MAPQ:%s\t%d\n",$2,$1);}' | gnuplot -e " set terminal dumb ;set nokey; plot '-' using 2:xtic(1) with boxes"



  400 ++---+---------+---------+---------+--------+---------+----***********
      |    +         +         +         +        +         +    *    +    *
  350 ++                                                         *        +*
      |                                                          *         *
      |                                                          *         *
  300 ++                                                         *        +*
      |                                                          *         *
  250 ++                                                         *        +*
      |                                                          *         *
  200 ++                                                         *        +*
      |                                                          *         *
      |                                                          *         *
  150 ++                                                         *        +*
      |                                                          *         *
  100 ++                                                         *        +*
      |                                                          *         *
      |                                                          *         *
   50 ++                                               ***********        +*
      |    +         +         +         +   ***********    +    *    +    *
    0 **********************************************************************
        MAPQ:7    MAPQ:25   MAPQ:0    MAPQ:70  MAPQ:29   MAPQ:37   MAPQ:60
ADD COMMENT
0
Entering edit mode

Thank you very much for this, this is greatly appreciated, and I apologise for the slow response to your helpful message

ADD REPLY
0
Entering edit mode

mark the answer(s) to be accepted by clicking on the green mark on the left.

ADD REPLY
0
Entering edit mode

Thanks, I already have. There is probably some lag/delay for you

ADD REPLY
3
Entering edit mode
2.7 years ago
Phil Ewels ★ 1.4k

I'm pretty sure that Qualimap BamQC (not to be confused with the BamQC tool from Babraham mentioned above) calculates and plots this. Qualimap is supported by MultiQC, however off the top of my head, I don't think that MultiQC replicates this plot in reports. I doubt it would be particularly difficult to add if you fancy it (or opening a GitHub issue with some example data), or you could probably sidestep that by using MultiQC's "custom content" feature.

ADD COMMENT
0
Entering edit mode

Thank you very much, I will look into this as soon as I find some time

ADD REPLY
3
Entering edit mode
2.7 years ago
Dave Carlson ★ 1.7k

This may be too simple for what you need, but I have a script here that uses pysam to extract MapQ scores from a bam file and then generates a kernel density plot of the scores.

Here is the actual code:

#!/usr/bin/env python

import pysam
import seaborn as sns
import argparse
from matplotlib import pyplot as plt

parser = argparse.ArgumentParser(description="Plot mapping quality score from bam file")

parser.add_argument('--bam', required=True, help='path to input bam file', action='store')
parser.add_argument('--png', required=True, help='path to output png file', action='store')

args=parser.parse_args()

output = args.png
input = args.bam 

bam = pysam.AlignmentFile(input, "rb")

quals = [read.mapping_quality for read in bam.fetch()]

sns.set_style("dark")
sns.set_context("paper")

plt.figure(figsize=(10,6))

ax = sns.kdeplot(quals, shade = True)
ax.set_ylabel('Density')
ax.set_xlabel('Mapping Quality Score')

plt.savefig(output)
ADD COMMENT
0
Entering edit mode

Thank you very much. It is appreciated. I hope to give this a try soon.

ADD REPLY

Login before adding your answer.

Traffic: 2645 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6