Extract Regions From A Bam To A Bed File Where The Average Mapping Quality Is Below 30.
3
0
Entering edit mode
11.1 years ago
William ★ 5.3k

How can I extract the regions with a mapping quality below 30 from a bam file to a bed file ?

I am not looking to extract the reads themselves but I just would like to know the regions in the bam with low average (below 30) mapping quality.

edit: I know that bam2Bed can export all the reads from a bam file to a bed file and that it also outputs the mapping quality as the bed score record. What I don't is how to go from there to a bed file with all the areas that have an average mapping quality below 30 ( for every position in that regoin). Or how to get there from another starting point.

So I am looking to compute the average mapping quality per covered position (nucleotide in the reference ) and then find all the regions that have an average mapping quality lower than 30.

bam bed • 7.6k views
ADD COMMENT
3
Entering edit mode
11.1 years ago
Jordan ★ 1.3k

I think you can use bamtools to filter out the reads based in map quality.

bamtools filter -in file.bam -mapQuality "<30"

I'm not sure what you mean just to extract the regions. But if you just want to extract chr and its positions, then you can pipe it to samtools.

samtools view file.filtered.bam | cut -f 3,4,10
ADD COMMENT
2
Entering edit mode
11.1 years ago

my tool samjs ( https://github.com/lindenb/jvarkit#samjs-filtering-a-sambam-file-with-javascript ) can use a javascript expression to filter the bam files.

If you want to get the mapped reads having a mapping quality < 40, the command would be

java -jar dist/samjs.jar I=input.bam \
    VALIDATION_STRINGENCY=SILENT \
    SCRIPT_EXPRESSION='!record.readUnmappedFlag && record.mappingQuality < 40' \
    OUT=result.bam

you can then convert result.bam to bed: see converting bam or sam files into bed files

ADD COMMENT
1
Entering edit mode
11.1 years ago
matted 7.8k

The samtools mpileup command will generate a per-base pileup whose 6th column is an ASCII-encoded representation of the mapping qualities of all reads at that base (details here). For example:

chr10    1    N    3    ^5C^3c^!c    A?%
chr10    2    N    6    Cac^!A^3A^3a    @8%6B:
chr10    3    N    9    CccCCc^2C^2C^:c    A?%IA7+BA

You can write a short script to parse that out and write a minimal BED file subject to your conditions:

import sys, numpy

last = (None, None)

for line in sys.stdin:
    line = line.strip().split()

    if numpy.average([ord(c) - 33 for c in line[5]]) < 30:
        curr = line[0], int(line[1])

        if curr[0] != last[0] or curr[1] != last[1] + 1:
            if last[0] is not None: print "%s\t%d\t%d" % (curr[0], orig[1]-1, last[1])
            orig = curr
        last = curr

if last[0] is not None: print "%s\t%d\t%d" % (curr[0], orig[1]-1, last[1])

So you could run this like samtools mpileup in.bam | python low_mapq_to_bed.py > out.bed.

ADD COMMENT

Login before adding your answer.

Traffic: 3299 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6