How to select aligned reads below certain mapping Quality (from BWA)?
4
3
Entering edit mode
7.6 years ago
kirannbishwa01 ★ 1.6k

Samtools can be used to select reads above certain mapping quality.

samtools view -h -b -q 30 aligned.bam -o above.mapQ30.bam

But, how to select a read below certain mapping quality - all aligned reads below mapQ 30?

I know it can be done using awk. But, the pipeline gets lengthy and time consuming when first need to convert bam to sam - separate header - use awk for mapQ below 30 - add header - sam file - convert to bam. Really, its taking a lots of time.

Thanks,

bwa bam mapping quality alignment samtools • 15k views
ADD COMMENT
1
Entering edit mode

It may be possible to do this with one of the many programs from BBMap (perhaps reformat.sh). Brian Bushnell may be along later to provide that solution.

ADD REPLY
9
Entering edit mode
7.6 years ago

Use SAMtools with the -U flag to output reads that don't match your filter, e.g.

samtools view -h -b -q 30 -U below_q30.bam aligned.bam

ADD COMMENT
0
Entering edit mode

Thanks Harold. I just realized that this will only work when using samtools version 1.3. The version samtools 1.19 which is more popular and installed by default in many linux station doesn't have this flag available.

So, folks make sure you have proper version of samtools installed. i.e 1.3 http://www.htslib.org/doc/samtools.html . And, thanks much for the script.

ADD REPLY
0
Entering edit mode

There is slight modification required in this script.

ADD REPLY
2
Entering edit mode
7.6 years ago
kirannbishwa01 ★ 1.6k

I haven't found any straight forward method, but if someone has similar problem in the future; below is my proposed solution:

Note: Check and make sure the mapped alignment is sorted.

samtools view -h mapped.bam -o mapped.sam

grep '^@' mapped.sam > RG.header # this is important when there is header info in the sam/bam file

grep -v '^@' mapped.sam > SAM_file.sam # retains the non header part of the mapped file

awk '($5 < 40)' SAM_file.sam > Below.mapQ40.sam

cat RG.header Below.mapQ40.sam > Below.mapQ40_mapped.sam

samtools view -h -S -b Below.mapQ40_mapped.sam -o Below.mapQ40_mapped.bam

samtools index Below.mapQ40_mapped.bam

If you don't care about header info (or if its absent):

samtools view -h mapped.bam -o mapped.sam

awk '($5 < 40)' mapped.sam > Below.mapQ40_mapped.sam

samtools view -h -S -b Below.mapQ40_mapped.sam -o Below.mapQ40_mapped.bam

samtools index Below.mapQ40_mapped.bam

ADD COMMENT
2
Entering edit mode
7.6 years ago
kirannbishwa01 ★ 1.6k

Method B - requires samtools version 1.3

first select the mapped reads only samtools view -b -h -F 4 aligned.bam > mapped_only.bam # this selects for mapped reads only.

samtools index mapped_only.bam # index the bam file

Now separate the mapped reads above mapQ 40 and below samtools -b -h -U below_mapQ40.bam -q 40 mapped_only.bam -o above_mapQ40.bam # make sure to use right samtools version (1.3)

samtools index above_mapQ40.bam # this is bam file with reads with mapQ 40 and above

samtools index below_mapQ40.bam # this is bam file with reads with mapQ below 40.

ADD COMMENT
0
Entering edit mode
7.6 years ago

I don't know if it's faster, but you could subtract the ≥Q30 file from the total using BEDtools intersect with -v flag.

Edit: The OP is correct; this was a bone-headed suggestion. See other answer.

ADD COMMENT
0
Entering edit mode

Bedtools rely on interval (co-ordinates; region of the genome). So, in the same interval (region of the genome) there can be reads with both high mapQ and low mapQ. So, this won't end up working. I haven't tried it but I think this is what will happen.

still thinking !!!

ADD REPLY

Login before adding your answer.

Traffic: 1499 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6