Question: Strange Behavior Filtering Reads With Samtools
1
gravatar for Linda
8.6 years ago by
Linda160
Linda160 wrote:

I am trying to filter the results of a paired end alignment with samtools. I would like to get all reads where both the pairs are unaligned. I tried

samtools view  -f 12 R28.bam | cut -f1-9|head
HWUSI-EAS570R_0028:2:96:4625:6148#0    141    Contig18298    1907    0    100M    =    1907    0
HWUSI-EAS570R_0028:1:86:7595:20840#0    141    Contig18298    1916    25    100M    =    1916    0
HWUSI-EAS570R_0028:3:120:15625:9857#0    93    Contig18301    2004    25    100M    =    2004    0
HWUSI-EAS570R_0028:5:89:9474:1987#0    157    Contig18321    1910    37    6M2D2M1I91M    =    1910    0
HWUSI-EAS570R_0028:5:53:6864:4533#0    141    Contig18345    1921    25    100M    =    1921    0
HWUSI-EAS570R_0028:1:27:10901:2866#0    93    Contig18367    1918    0    100M    =    1918    0
HWUSI-EAS570R_0028:1:36:7563:3669#0    93    Contig18367    1918    0    100M    =    1918    0
HWUSI-EAS570R_0028:1:37:12448:8115#0    93    Contig18367    1918    0    100M    =    1918    0
HWUSI-EAS570R_0028:2:68:9495:3274#0    157    Contig18367    1918    0    100M    =    1918    0
HWUSI-EAS570R_0028:3:20:9943:7308#0    93    Contig18367    1918    0    100M    =    1918    0

The 100M indicates to me that these are matching the reference sequence. So I tried a two step filter.

samtools view -b -f 4 R28.bam | samtools view -f 8 - |cut -f1-9|head

HWUSI-EAS570R_0028:2:96:4625:6148#0    141    Contig18298    1907    0    100M    =    1907    0
HWUSI-EAS570R_0028:1:86:7595:20840#0    141    Contig18298    1916    25    100M    =    1916    0
HWUSI-EAS570R_0028:3:120:15625:9857#0    93    Contig18301    2004    25    100M    =    2004    0
HWUSI-EAS570R_0028:5:89:9474:1987#0    157    Contig18321    1910    37    6M2D2M1I91M    =    1910    0
HWUSI-EAS570R_0028:5:53:6864:4533#0    141    Contig18345    1921    25    100M    =    1921    0
HWUSI-EAS570R_0028:1:27:10901:2866#0    93    Contig18367    1918    0    100M    =    1918    0
HWUSI-EAS570R_0028:1:36:7563:3669#0    93    Contig18367    1918    0    100M    =    1918    0
HWUSI-EAS570R_0028:1:37:12448:8115#0    93    Contig18367    1918    0    100M    =    1918    0
HWUSI-EAS570R_0028:2:68:9495:3274#0    157    Contig18367    1918    0    100M    =    1918    0
HWUSI-EAS570R_0028:3:20:9943:7308#0    93    Contig18367    1918    0    100M    =    1918    0

Now, this is strange, because I thought the first filter should have removed all reads that aligned anywhere? So what is going on here? My aim is to collect all unaligned pairs.

filter bam samtools • 1.7k views
ADD COMMENTlink modified 9 months ago by Biostar ♦♦ 20 • written 8.6 years ago by Linda160
2
gravatar for Istvan Albert
8.6 years ago by
Istvan Albert ♦♦ 84k
University Park, USA
Istvan Albert ♦♦ 84k wrote:

The SAM specification states that:

Bit 0x4 is the only reliable place to tell whether the segment is unmapped. If 0x4 is set, no assumptions can be made about RNAME, POS, CIGAR, MAPQ, bits 0x2, 0x10 and 0x100 and the bit 0x20 of the next segment in the template.

This means that you don't need to worry about the other fields since those are not meaningful. These are probably not cleared due to some optimization reason.

ADD COMMENTlink written 8.6 years ago by Istvan Albert ♦♦ 84k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 764 users visited in the last hour