I am trying to construct B-allele frequency plots from NGS data; and for that I am using samtools mpileup output at positions of interest.
Unfortunately, I am not able to establish a one-to-one relationship between the bases in base string column [column 4], and the base quality string [column 5] for lines with indels.
Here is an example from samtools sourceforge page:
seq2 156 A 11 .$......+2AG.+2AG.+2AGGG <975;:<<<<<
According to columns 4 and 6, there are 11 reads aligned to this position. But when trying to parse the base string column, I get
- 9 reads aligned to reference base (A)
- 3 reads with an insertion of AG
- 2 reads with base G
Can anyone please explain what is going on?
Thank you in advance!