Question: Struggling with the information contained into a BAM/SAM file
1
gravatar for Antonio R. Franco
3.5 years ago by
Spain. Universidad de Córdoba
Antonio R. Franco4.0k wrote:

SAM/BAM files are that kind of files with hidden treasures that needs to be unrevealed for the inexperienced user. That is me...

My case. After analyzing a BAM file when asking myself how many reads´ mates remain unmapped

One posibility to answer this is by analyzing the FLAG values with samtools.

I understand FLAGS are formed by an unique combination of many other individual flags. So, all of these FLAGS values: 73, 89, 121, 153, 185, 137, 77, 141 and so on,  contain the "8" , that in turn, should be indicating  that the mate read remains unmapped. I got this information from this WEB page to get an idea about what information the FLAGS can provide

Now a summary..To answer this question I have analyzed a unique BAM file in two ways


- One is by counting the number of "*" present in column 7 (RNEXT value), because in agreement with the official SAM file specification, this could mean that your mate can be unmapped (This field is set as ‘*’ when the information is unavailable). In this case, I got over 65000 sequences that could be unmapped 

- However, if I run

samtools view file.bam -f 8 | wc -l

I ended with only 2903 sequences.

One possibility is that when using the -f option, the program is looking for a lonely "8" in the FLAG field. But if I look in column 2 in the BAM file, I don't find any FLAG with only that lonely 8. That convinced me that the samtools view -f FLAG try to find any combination of FLAG values that intrinsically contains that 8, and thus, it should provide with the information about how many mates are being unmapped

With all this information, I still are not fully confident in knowing what are the right answer to this question. Or I have serious doubts about what is the usefulness of using the -f qualificator in the  samtools view command. Or maybe, many other "lonely" flags should be included in the searching because i miss some important information and/or the orientation do not seem to matter when the BAM file is generated

sam aligment bam • 1.2k views
ADD COMMENTlink modified 3.5 years ago by Devon Ryan89k • written 3.5 years ago by Antonio R. Franco4.0k
2
gravatar for Devon Ryan
3.5 years ago by
Devon Ryan89k
Freiburg, Germany
Devon Ryan89k wrote:

-f 8 is an and operation, so it's just looking that all bits set in 8 are set in the flag (-F, on the other hand, just needs a single bit of overlap between the value provided and the flag). Anyway, the most robust method would be samtools view -cf8 -F 4 file.bam. This discounts cases where both mates in a pair are unmapped, since I assume you don't care about those. Note that using column 7 to get this is risky since it relies on aligners behaving in a way that isn't guaranteed (e.g., some aligners will give mapping coordinates to unmapped reads).

ADD COMMENTlink written 3.5 years ago by Devon Ryan89k

So what is exactly providing the "*" value in the 7th column ?.

Just "more robust" being said, I understand you believe that the 2903 answer is the correct one

 

ADD REPLYlink modified 3.5 years ago • written 3.5 years ago by Antonio R. Franco4.0k

The asterisk is from the aligner, each of which has different quirks in its output. For example, I recall that some aligners won't set mate alignment information if it aligners mates as singletons, though I don't recall exactly which ones do this.

ADD REPLYlink written 3.5 years ago by Devon Ryan89k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2104 users visited in the last hour