Extract reads with insert size 0 from a bam file
1
0
Entering edit mode
2.8 years ago
jeni ▴ 70

Hi!

I have been looking to this post How To Extract Reads With Wrong Insert Size With Samtools and I wonder how could I extract from a bam file just those reads with insert size of 0.

I have been trying

samtools stats -i 0 sample.bam


Thanks!

alignment • 1.9k views
0
Entering edit mode

The command to extract records from a SAM/BAM file is samtools view (as shown in that post) - I'm not sure where you're getting samtools stats from.

0
Entering edit mode

But how can I get it? because I have seen no flags for this kind of reads

0
Entering edit mode

What does it mean when you say insert size of 0?

0
Entering edit mode

I would like to obtain the paired reads that for some strange reason are mapped in the same positions, having no insert size.

1
Entering edit mode

Only scenario where I suppose that can happen is if the size of insert = number of cycles of sequencing. So you would need to find paired reads which have identical mapping start on the same chromosome. Otherwise it would be unlikely.

1
Entering edit mode
2.8 years ago

To see what types of characteristics are available vie the FLAG, check out the Broad Institute's Explain Flag Site. As you can see, the insert size is not part of that list. However, depending on the alignment software you used, the cases you've described may be captured by the flag for "proper pair".

The fragment size is usually recorded in the 9th column (TLEN), if I remember correctly (just double-checked and found this post that goes into more details; definitely read through the post and SAM documentation linked there to get a better understanding of the SAM format).

samtools view myfile.bam | awk '$9 == 0 {print$0}' might do the trick then (have not tested this and it's probably not very fast either). It seems like bamtools' filtering tool might offer the same functionality.

EDIT: Istvan offered a more elegant solution in this post