EDIT1: Variant calling was done with two commonly used variant callers ( VarScan and Strelka ). The data is paired end RNA-Seq data. The presumptive variants are from exome sequencing.
so I am trying to count reads over a presumptive SNP position with their respective nucleotides.
E.g the REF is A, the ALT is C and the position has 50 reads for A, and 23 reads for C.
I am currently using bam-readcount for this purpose but I am little worried that my results might be skewed.
I am note sure if ( because I did not find it in the documentation ) bam-readcount handles paired-end data.
For example if two overlapping mates cover the base, and they both show the ALT base, shouldn't they be technically only counted as one occurrence since the fragment has only been sequenced once but from two sites ?
What happens if one mate only does overlap the ALT position ? How is that counted ?
I assume that bam-readcount is aware of the pairs and counts overlapping pairs at a position as one while single read also counts as 1.
After doing some checking, I found that igv count always reports about twice as many reads at a given position as bam-readcount which leads me to believe that possibly, bam-readcount is actually doing what I assumed but I am not sure.
Maybe some of you have an idea if I am right or can suggest a different approach.
Or maybe I am entirely wrong and you guys can point me in the right direction.
I added some information as an edit. Especially the fact, that the SNPs were discovered using exome data and variant callers. I am trying to check if these SNPs are actually expressed.