Extract reads where both pairs maps properly and none of the pairs are split from a bam file
1
0
Entering edit mode
6.7 years ago

Hi,

I want to extract only those reads from bam file in a given region where both pairs are mapped properly and are not split.

Currently I am doing:

samtools view -h -f 2 -F 4 input.bam "chr:start-stop" | awk '$6 !~ /S/ || $1 ~ /@/' | samtools view -bS - > output.proper.nosplit.bam

Where awk '$6 !~ /S/ || $1 ~ /@/' removes the split reads. However, I also want the properly mapped pair of the split read to be removed.

I tried to run samtools view -h -f 2 -F 4 output.proper.nosplit.bam > output2.sam, but it is still not removing the pair of the split read.

Any idea?

Thanks in advance

samtools mapping genome alignment sequencing • 2.0k views
ADD COMMENT
3
Entering edit mode
6.7 years ago

create a list of read-names that are not properly mapped or unmapped or contain a soft clip or a hard clip.

samtools view  -F 2  in.bam | cut -f 1 > out.1
samtools view  -f4  in.bam | cut -f 1 >> out.1
samtools view -F4 in.bam |  awk '$6 ~ /S/ || $6 ~ /H/' | cut -f1 >> out.1

sort / uniq the list , sort your bam on query-name and use the list with https://broadinstitute.github.io/picard/command-line-overview.html#FilterSamReads with FILTER=excludeReadList

ADD COMMENT
0
Entering edit mode

can we use 2048 flag (and/or 256) to filter out split reads from the region of interest?

ADD REPLY
0
Entering edit mode

Depends on which flag your aligner writes the split reads to. To remove just the secondary/supplementary alignments I would use the relevant SAM flag, to remove all records that are split reads I would check for the presence of the SA tag.

ADD REPLY
0
Entering edit mode

thanks @d-cameron

ADD REPLY

Login before adding your answer.

Traffic: 2768 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6