Strange Behavior Filtering Reads With Samtools
1
1
Entering edit mode
12.4 years ago
Linda ▴ 160

I am trying to filter the results of a paired end alignment with samtools. I would like to get all reads where both the pairs are unaligned. I tried

samtools view  -f 12 R28.bam | cut -f1-9|head
HWUSI-EAS570R_0028:2:96:4625:6148#0    141    Contig18298    1907    0    100M    =    1907    0
HWUSI-EAS570R_0028:1:86:7595:20840#0    141    Contig18298    1916    25    100M    =    1916    0
HWUSI-EAS570R_0028:3:120:15625:9857#0    93    Contig18301    2004    25    100M    =    2004    0
HWUSI-EAS570R_0028:5:89:9474:1987#0    157    Contig18321    1910    37    6M2D2M1I91M    =    1910    0
HWUSI-EAS570R_0028:5:53:6864:4533#0    141    Contig18345    1921    25    100M    =    1921    0
HWUSI-EAS570R_0028:1:27:10901:2866#0    93    Contig18367    1918    0    100M    =    1918    0
HWUSI-EAS570R_0028:1:36:7563:3669#0    93    Contig18367    1918    0    100M    =    1918    0
HWUSI-EAS570R_0028:1:37:12448:8115#0    93    Contig18367    1918    0    100M    =    1918    0
HWUSI-EAS570R_0028:2:68:9495:3274#0    157    Contig18367    1918    0    100M    =    1918    0
HWUSI-EAS570R_0028:3:20:9943:7308#0    93    Contig18367    1918    0    100M    =    1918    0

The 100M indicates to me that these are matching the reference sequence. So I tried a two step filter.

samtools view -b -f 4 R28.bam | samtools view -f 8 - |cut -f1-9|head

HWUSI-EAS570R_0028:2:96:4625:6148#0    141    Contig18298    1907    0    100M    =    1907    0
HWUSI-EAS570R_0028:1:86:7595:20840#0    141    Contig18298    1916    25    100M    =    1916    0
HWUSI-EAS570R_0028:3:120:15625:9857#0    93    Contig18301    2004    25    100M    =    2004    0
HWUSI-EAS570R_0028:5:89:9474:1987#0    157    Contig18321    1910    37    6M2D2M1I91M    =    1910    0
HWUSI-EAS570R_0028:5:53:6864:4533#0    141    Contig18345    1921    25    100M    =    1921    0
HWUSI-EAS570R_0028:1:27:10901:2866#0    93    Contig18367    1918    0    100M    =    1918    0
HWUSI-EAS570R_0028:1:36:7563:3669#0    93    Contig18367    1918    0    100M    =    1918    0
HWUSI-EAS570R_0028:1:37:12448:8115#0    93    Contig18367    1918    0    100M    =    1918    0
HWUSI-EAS570R_0028:2:68:9495:3274#0    157    Contig18367    1918    0    100M    =    1918    0
HWUSI-EAS570R_0028:3:20:9943:7308#0    93    Contig18367    1918    0    100M    =    1918    0

Now, this is strange, because I thought the first filter should have removed all reads that aligned anywhere? So what is going on here? My aim is to collect all unaligned pairs.

samtools filter bam • 2.3k views
ADD COMMENT
2
Entering edit mode
12.4 years ago

The SAM specification states that:

Bit 0x4 is the only reliable place to tell whether the segment is unmapped. If 0x4 is set, no assumptions can be made about RNAME, POS, CIGAR, MAPQ, bits 0x2, 0x10 and 0x100 and the bit 0x20 of the next segment in the template.

This means that you don't need to worry about the other fields since those are not meaningful. These are probably not cleared due to some optimization reason.

ADD COMMENT

Login before adding your answer.

Traffic: 1628 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6