Question: bam-readcount : Reference_base is not the base with highest count
4.0 years ago by
United States
I recently ran bam-readcount on my bamfile and I noticed something weird about my result

chr1    201334382   G   58077   =:0:0.00:0.00:0.00:0:0:0.00:0.00:0.00:0:0.00:0.00:0.00  A:57989:238.35:35.90:0.01:34667:23322:0.51:0.01:43.91:34670:0.52:122.12:0.49    C:8:255.00:33.00:0.00:6:2:0.37:0.02:56.38:6:0.76:120.62:0.62    G:61:242.61:34.13:0.00:41:20:0.37:0.01:20.82:41:0.77:116.07:0.60    T:17:255.00:35.24:0.00:15:2:0.25:0.02:50.18:15:0.76:117.06:0.69 N:0:0.00:0.00:0.00:0:0:0.00:0.00:0.00:0:0.00:0.00:0.00  -G:2:255.00:0.00:0.00:1:1:0.68:0.02:21.00:1:0.38:124.50:0.34

Clearly, A has the highest base count (57989). Then why is the reference base being reported as G?

Reference base is from your reference and is being reported as is, no?

4.0 years ago by
In the bam-readcount GitHub repository, you can clearly understand the output format:

chr position    reference_base  depth   base:count:avg_mapping_quality:avg_basequality:avg_se_mapping_quality:num_plus_strand:num_minus_strand:avg_pos_as_fraction:avg_num_mismatches_as_fraction:avg_sum_mismatch_qualities:num_q2_containing_reads:avg_distance_to_q2_start_in_q2_reads:avg_clipped_length:avg_distance_to_effective_3p_end   ...

So, in your case, a G is reported because you have a G in your reference genome.

OK. So this basically means that the genotype is homozygous for the alternative allele ?

ADD REPLYlink written 4.0 years ago by Apoorva280
