Trying to understand samtools -f and -F
0
0
Entering edit mode
11 months ago
shibl_a ▴ 20

I'm trying to understand the samtools flags -f and -F better...

In a scenario where I am trying to retrieve 'clean' bacterial reads from a metagenomic sample that contains host reads and bacterial reads, I first use bowtie2 to map the reads to an index made of the host genome then I use the below command to get everything that has been unmapped (i.e. everything that is NOT the host):

samtools view --threads 60 -b -f 12 -F 256 SAMPLE_mapped_and_unmapped.bam > SAMPLE_bothReadsUnmapped.bam


Now, If I want to repeat this step but this time to get the mapped reads (i.e. everything that is NOT bacteria), what would be the opposite of -f 12 -F 256? would it be -f 2 -F 12? where -f 12 = output all reads unmapped and mates unmapped and -F 256 = do not output reads that shows primary alignment and -f 2 = output all reads that mapped in pairs and -F 12 = do not output reads that were unmapped and mates that were unmapped

Please correct me if my understanding of those flags/numbers in these simple terms is inaccurate. Thanks.

samtools bowtie2 alignments mapping • 713 views
0
Entering edit mode

I am not answering your question directly but want to point out that you can use bbsplit.sh (part of BBMap suite) to bin your reads against multiple genomes in one step. A different program called seal.sh can do this by k-mer matching. A guide for seal is available.

0
Entering edit mode

Thanks for that - I'm familiar with bbsplit.sh but not seal.sh. I plan to look at seal.sh more closely but do you happen to know if it would work well with a mixed metagenomic sample (i.e. prokaryotic + eukaryotic reads)?

0
Entering edit mode

I don't think your -F256 is adding anything, since kind of by definition, unmapped reads won't have secondary alignments.

And of course, if you have genomes for your bacterial contamination, you need to eventually align to a combined genome of them and the host, because aligning to host alone is likely going to result in the aligner forcing some contaminant reads to align where they do not belong.

0
Entering edit mode

My understanding is that -F 256 is avoiding adding mapped reads to the output...and so I was wondering what its 'opposite' would be. This is from an example I found here: https://sites.google.com/site/wiki4metagenomics/tools/short-read/remove-host-sequences

You're right it is very likely that some reads will align where they shouldn't and that's why I'm going through a series of bowtie2 alignment trials while trying to wrap my head around these flags in simpler terms.

0
Entering edit mode

No, that's not what the 256 flag means. That's not even what your documentation says it means. But you could always try it with and without and count to see if the # of reads changes.

0
Entering edit mode

secondary: 0x100 bit set

supplementary: 0x800 bit set

duplicates: 0x400 bit set

mapped: 0x4 bit not set

paired in sequencing: 0x1 bit set

read1: both 0x1 and 0x40 bits set

read2: both 0x1 and 0x80 bits set

properly paired: both 0x1 and 0x2 bits set and 0x4 bit not set

with itself and mate mapped: 0x1 bit set and neither 0x4 nor 0x8 bits set

singletons: both 0x1 and 0x8 bits set and bit 0x4 not set

For example, properly paired would be:

samtools view -b -o properly_paired.bam -f 0x1 -f 0x2 -F 0x4 original.bam