paired-end BAM file sequence search
1
0
Entering edit mode
7.2 years ago
igor 13k

It's easy to search for a particular sequence in a BAM file with samtools view | grep. This works well with individual reads. However, what if you are trying to find a sequence in paired-end reads (meaning it's present in either one)?

Is there a better solution than to retrieve the read IDs that match your sequence of interest and then search for those read IDs to get both mates?

sequence alignment • 1.2k views
ADD COMMENT
0
Entering edit mode
7.2 years ago
Gabriel R. ★ 2.9k

if you do not care about the order of the reads and do not want to script, sort your BAM file by readname, then run:

  cat <(samtools view  -H  in.bam ) <(samtools view  -f 64 in.bam  |grep -A 1 SEQHERE) <(samtools view  -f 128 in.bam  |grep -B 1 SEQHERE) | samtools view -bS /dev/stdin  > out.bam
ADD COMMENT
2
Entering edit mode

I don't really understand how it could work with 'samtools view -f 64 in.bam| grep -A 1' ?

ADD REPLY
0
Entering edit mode

As per the comment below, yes I assumed wrongly that the BAM file is sorted wrt to names. I edited my post.

ADD REPLY
1
Entering edit mode

Interesting. I guess that assumes read name sorting and that each read has a mate. However, if the sequence is found in both mates, the pair would get duplicated.

ADD REPLY
1
Entering edit mode

but 1) flag 64 is for first read in pair, so you will never find the second read in pair in the very next line. 2) reads of the same pair are not always on two successive lines.

ADD REPLY
0
Entering edit mode

Thank you Igor and Pierre! I use BAM files for unaligned data and forget BAM files can be aligned reads (sorted) as well :-P I edited my post.

ADD REPLY

Login before adding your answer.

Traffic: 2387 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6