SAM/BAM files are that kind of files with hidden treasures that needs to be unrevealed for the inexperienced user. That is me...
My case. After analyzing a BAM file when asking myself how many reads´ mates remain unmapped
One posibility to answer this is by analyzing the FLAG values with samtools.
I understand FLAGS are formed by an unique combination of many other individual flags. So, all of these FLAGS values: 73, 89, 121, 153, 185, 137, 77, 141 and so on, contain the "8" , that in turn, should be indicating that the mate read remains unmapped. I got this information from this WEB page to get an idea about what information the FLAGS can provide
Now a summary..To answer this question I have analyzed a unique BAM file in two ways
- One is by counting the number of "*" present in column 7 (RNEXT value), because in agreement with the official SAM file specification, this could mean that your mate can be unmapped (This field is set as ‘*’ when the information is unavailable). In this case, I got over 65000 sequences that could be unmapped
- However, if I run
samtools view file.bam -f 8 | wc -l
I ended with only 2903 sequences.
One possibility is that when using the -f option, the program is looking for a lonely "8" in the FLAG field. But if I look in column 2 in the BAM file, I don't find any FLAG with only that lonely 8. That convinced me that the samtools view -f FLAG try to find any combination of FLAG values that intrinsically contains that 8, and thus, it should provide with the information about how many mates are being unmapped
With all this information, I still are not fully confident in knowing what are the right answer to this question. Or I have serious doubts about what is the usefulness of using the -f qualificator in the samtools view command. Or maybe, many other "lonely" flags should be included in the searching because i miss some important information and/or the orientation do not seem to matter when the BAM file is generated