**10**wrote:

Hi all, bioinformatics noob here.

I have been trying to call variants and don't fully understand what the I16 tag's first 4 entries mean. I found this table:

```
1 #reference Q13 bases on the forward strand 2 #reference Q13 bases on the reverse strand
3 #non-ref Q13 bases on the forward strand 4 #non-ref Q13 bases on the reverse strand
5 sum of reference base qualities 6 sum of squares of reference base qualities
7 sum of non-ref base qualities 8 sum of squares of non-ref base qualities
9 sum of ref mapping qualities 10 sum of squares of ref mapping qualities
11 sum of non-ref mapping qualities 12 sum of squares of non-ref mapping qualities
13 sum of tail distance for ref bases 14 sum of squares of tail distance for ref bases
15 sum of tail distance for non-ref bases 16 sum of squares of tail distance for non-ref
```

And learnt that Q13 means a base quality of bigger than 13 Phred Score.

But I dont understand why sometimes I am getting variants that seem to be in the entry 2 rather than entry 3. Can someone give an example of when I should expect to see an entry in 2 (reference bases on the reverse strand) rather than in 3 (no-ref bases in the forward strand) or in 4 (no-ref bases in the reverse strand). I have been trying to see how the results match up to my sam file but can't find how they are consistent.

Bonus question:
I am trying to extract the number of 'mutant' **paired** reads observed per reference genome. Eg. I have a genome with 40 pairs that have no snps,indels etc., 3 pairs that have one indel and 10 pairs that have one snp. Then I want to know that this genome had 13 'mutant' pairs that were read.
In order for me to do this it seems I need 2 things. 1. To be able to first get the number of paired snps/indels from my sam file. `bcftools call`

and `bcftools view`

dont seem to be doing this because of the I16 outputs I dont understand.
2. A way to extract this information. `bcftools stats`

seems to aggregate snps and indels, etc rather than keeping them separated out by reference genome.

**14k**• written 2.1 years ago by tbb21 •

**10**