Samtools stats -F showing filtering from when and under what parameters?
0
0
Entering edit mode
3.5 years ago

I have a alignment of merged paired end Illumina short reads on de novo assembled Pacbio long reads generated by

bwa mem $reference $short> $outdir/$prefix.sam

I am wanting to see the quality of the alignment so i am running samtools stats with the -F flag but am unsure as to what this flag is actually showing me:

samtools stats $outdir/$prefix.sam
raw total sequences:    1815002
filtered sequences:     0
sequences:      1815002

samtools stats -F $outdir/$prefix.sam
raw total sequences:    1819392
filtered sequences:     911247
sequences:      908145

The sam tools documentation is quite limited on that this is doing/showing

What are the filtering parameters and when did this filtering occur? It is just the default filtering bwa mem does or something like this?

alignment • 781 views
ADD COMMENT
1
Entering edit mode

Use these sites that explain various samtools flags in plain english:

https://broadinstitute.github.io/picard/explain-flags.html
https://www.samformat.info/sam-format-flag-single (check the links at top of page to change format)

It is just the default filtering bwa mem does or something like this?

What do you mean by that? An aligner should not be filtering anything. soft-clipping of bases that don't map from a read is probably the closest thing.

ADD REPLY
0
Entering edit mode

@genomax Thank you for your response. I can't see anythign to do with the -F option on these sites and I can't find a numerical ect representation of -F anywhere. This seems to be a samtools specific thing and I am unsure if and how it is encoded in the SAM file

I to was under the impression that aligners do no filtering but then became confuse by this samtools option. But I was meaning to try understand where this filtering is employed and what parameters it is bound by.

ADD REPLY
1
Entering edit mode

Have you looked at the SAM format specification document? On page 7 you will find a description of the bitwise flags. You will find these flags in column 2 of every SAM alignment line.

With -F option these are the flags that you are specifying for doing certain operations using samtools programs. These flags can be represented numerically as well as a hexadecimal value (first value is numeric and second is hex in example below). Following example represents a read marked as secondary alignment.

256            0x100        secondary alignment

Sites I linked above provide explanation of what these codes mean (If you are starting with say 0x100) or what code to use with -F if you want to do a certain operation. I don't know off the top of my head what the default value for -F is. That is what is being used to filter your data in your original post.

ADD REPLY
0
Entering edit mode

Ah okay so -F is filtering based on the SAM flags preset in column 2? Thank you :)

ADD REPLY

Login before adding your answer.

Traffic: 1355 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6