Question: Extract Regions From A Bam To A Bed File Where The Average Mapping Quality Is Below 30.
0
gravatar for William
6.4 years ago by
William4.5k
Europe
William4.5k wrote:

How can I extract the regions with a mapping quality below 30 from a bam file to a bed file ?

I am not looking to extract the reads themselves but I just would like to know the regions in the bam with low average (below 30) mapping quality.

edit: I know that bam2Bed can export all the reads from a bam file to a bed file and that it also outputs the mapping quality as the bed score record. What I don't is how to go from there to a bed file with all the areas that have an average mapping quality below 30 ( for every position in that regoin). Or how to get there from another starting point.

So I am looking to compute the average mapping quality per covered position (nucleotide in the reference ) and then find all the regions that have an average mapping quality lower than 30.

bed bam • 4.7k views
ADD COMMENTlink modified 6.4 years ago by Jordan1.1k • written 6.4 years ago by William4.5k
3
gravatar for Jordan
6.4 years ago by
Jordan1.1k
Pittsburgh
Jordan1.1k wrote:

I think you can use bamtools to filter out the reads based in map quality.

bamtools filter -in file.bam -mapQuality "<30"

I'm not sure what you mean just to extract the regions. But if you just want to extract chr and its positions, then you can pipe it to samtools.

samtools view file.filtered.bam | cut -f 3,4,10
ADD COMMENTlink modified 6.4 years ago • written 6.4 years ago by Jordan1.1k
2
gravatar for Pierre Lindenbaum
6.4 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum124k wrote:

my tool samjs ( https://github.com/lindenb/jvarkit#samjs-filtering-a-sambam-file-with-javascript ) can use a javascript expression to filter the bam files.

If you want to get the mapped reads having a mapping quality < 40, the command would be

java -jar dist/samjs.jar I=input.bam \
    VALIDATION_STRINGENCY=SILENT \
    SCRIPT_EXPRESSION='!record.readUnmappedFlag && record.mappingQuality < 40' \
    OUT=result.bam

you can then convert result.bam to bed: see converting bam or sam files into bed files

ADD COMMENTlink written 6.4 years ago by Pierre Lindenbaum124k
1
gravatar for matted
6.4 years ago by
matted7.2k
Boston, United States
matted7.2k wrote:

The samtools mpileup command will generate a per-base pileup whose 6th column is an ASCII-encoded representation of the mapping qualities of all reads at that base (details here). For example:

chr10    1    N    3    ^5C^3c^!c    A?%
chr10    2    N    6    Cac^!A^3A^3a    @8%6B:
chr10    3    N    9    CccCCc^2C^2C^:c    A?%IA7+BA

You can write a short script to parse that out and write a minimal BED file subject to your conditions:

import sys, numpy

last = (None, None)

for line in sys.stdin:
    line = line.strip().split()

    if numpy.average([ord(c) - 33 for c in line[5]]) < 30:
        curr = line[0], int(line[1])

        if curr[0] != last[0] or curr[1] != last[1] + 1:
            if last[0] is not None: print "%s\t%d\t%d" % (curr[0], orig[1]-1, last[1])
            orig = curr
        last = curr

if last[0] is not None: print "%s\t%d\t%d" % (curr[0], orig[1]-1, last[1])

So you could run this like samtools mpileup in.bam | python low_mapq_to_bed.py > out.bed.

ADD COMMENTlink written 6.4 years ago by matted7.2k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2211 users visited in the last hour