extract unmapped reads from WGS data - which samtools parameters to choose
0
0
Entering edit mode
3.0 years ago
Palgrave ▴ 110

I have seen at least two methods for extracting the unmapped reads from a paired end WGS dataset. Does it make any difference which one I choose. Whats the advantage with the first?

methods 1: samtools view -f 1 -F 3842 | samtools view -f 12 -F 3328 -

methods 2: samtools view -f 4

WGS Sequencing • 1.2k views
ADD COMMENT
0
Entering edit mode

F = Filter out

f = Keep

Explanation of SAM flags (LINK for lookup)

1 = read paired

3842 =  read mapped in proper pair (0x2)*
    not primary alignment (0x100)
    read fails platform/vendor quality checks (0x200) 
    read is PCR or optical duplicate (0x400) 
    supplementary alignment (0x800)

12 =   read unmapped (0x4)
    mate unmapped (0x8)*

3328 = not primary alignment (0x100)
    read is PCR or optical duplicate (0x400)
    supplementary alignment (0x800)

For second operation:

4 = read unmapped (0x4)

ADD REPLY
0
Entering edit mode

Thanks for showing this. But my question is really if there is any biological justification for choosing the simple approach (method 2) versus the more comprehensive filtering (method 1)

ADD REPLY
0
Entering edit mode

Perhaps not for simple whole genome sequencing but if you were going to call variants then you will want to take all the other filters into account that are in method 1.

ADD REPLY
0
Entering edit mode

Just to add to this: If you are using Bowtie2, you could filter unmapping reads upon alignment.

For this, you could add the option --al-conc as per the manual.

ADD REPLY

Login before adding your answer.

Traffic: 3790 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6