How To Interpret Output From Bam-Readcount
1
1
Entering edit mode
10.6 years ago
merbergd ▴ 10

After reading the documentation at https://github.com/genome/bam-readcount/blob/master/README.textile there is one column in the output from bam-readcount that I still don't understand.

Columns 1-4 are chromosome, position, reference base, readcount. Columns 6-9 are colon separated values reflecting counts and quality metrics for reads with A, C, G, T, and ambiguous calls at the current position.

But what is column 5? In my output, it begins with an "=", then every colon separated value is "0" in each row.

Here is a sample row:

1 897037 C 27 =:0:0.00:0.00:0.00:0:0:0.00:0.00:0.00:0:0.00:0.00:0.00 A:0:0.00:0.00:0.00:0:0:0.00:0.00:0.00:0:0.00:0.00:0.00 C:25:51.36:30.44:35.72:12:13:0.44:0.01:8.20:13:0.41:74.48:0.51 G:2:60.00:32.50:37.00:1:1:0.58:0.01:32.50:1:0.24:76.00:0.28 T:0:0.00:0.00:0.00:0:0:0.00:0.00:0.00:0:0.00:0.00:0.00 N:0:0.00:0.00:0.00:0:0:0.00:0.00:0.00:0:0.00:0.00:0.00

Thank you very much.

samtools • 9.4k views
ADD COMMENT
0
Entering edit mode

Interestingly from going through the source code, it looks like that the "=" call should only have counts when bam1_seqi returns a 0, which shouldn't happen.

ADD REPLY
0
Entering edit mode

I'm trying to understand bam1_seqi from the samtools distribution and the documentation at http://samtools.sourceforge.net/samtools/bam/PDefines/PDefines.html. Is it correct that bam1_seqi typically returns a base (or more accurately, a number that represents a base). Could it be that it returns an "=' when there is an error condition?

ADD REPLY
0
Entering edit mode

My comment was probably not very informative, sorry about that! From perusing the source code for bam-readcount, it seems that legitimate basecalls are "=", "A", "C", "G", "T", and "N". I had sort of assumed that "=" might mean "same as the reference", which would be useful, however there's nothing I could find to indicate that. Instead, what I found is that, for each read covering some base in the output in a pileup (from samtools, bam-readcount uses its C API), it finds the position in the read covering it and increments the various statistics according to the nucleotide at that position. It gets the nucleotide from a call to bam1_seqi on each read. bam1_seqi should only return 1, 2, 4, 8, and 15 (for A, C, G, T and N, respectively). The metrics for "=" will be updated when a 0 is returned, but that shouldn't occur. I wonder if this is just a place-holder for intended functionality then (unless I just missed something in the source code). If you don't get a reply from the author of bam-readcount in the next day or so, perhaps send him/her an email, since this seems undocumented.

BTW, your suspicion of only updating "=" on an error would be another good explanation.

ADD REPLY
0
Entering edit mode

Thank you for a very detailed reply!

ADD REPLY
2
Entering edit mode
10.5 years ago
ernfrid ▴ 400

The intent was to support BAM files where basecalls matching the reference are reported as an =. Based on the samtools code, bam_seqi should return a 0 if this character is used. Rather than doing the correct thing and interpreting those as the actual reference base, this was implemented as a separate bin to specifically track those bases reported as '='. I have yet to encounter a BAM encoded in this fashion and so the '=' section continues to sit by itself as speculative code. I'm loathe to remove it at this point as I don't want to break anyone's parsers unnecessarily.

ADD COMMENT
0
Entering edit mode

Out of curiosity, which version of samtools was this output of bam1_seqi based on? The recent versions of samtools that I've seen don't utilize a reference fasta here, but it'd be good to know if this was the case in some older version.

ADD REPLY
1
Entering edit mode

Well, I'm basing this on the current and past pileup code: https://github.com/samtools/samtools/blob/develop/bam_plcmd.c#L74-L77 and not the implementation of bam1_seqi. It may be that the bam1_seqi has moved on from this functionality and what's in pileup_seq is cruft.

ADD REPLY
0
Entering edit mode

Interesting, I guess I'd never looked at the pileup code. Thanks!

ADD REPLY

Login before adding your answer.

Traffic: 2708 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6