I need some suggestions on how to calculate the zygosity of large TE indels in different samples which are mapped to the reference. I am working on a diploid genome for which I have already mapped and inferred the indels. I also have the information about the number of split/discordant reads that support the indel at a given position per sample.
I have a bed file with coordinates of insertion of transposable elements, and another with coordinates of deletion for each sample. With the bam alignment files for each sample I was thinking I can get the properly mapped reads with
samtools view -f 2 -F 4 input.bam "chr:start-stop". For deletion variants, I was considering to count it as heterozygous if there is at least one properly mapped read at the position in a sample.
For insertion variants, I was considering to take the average of the properly mapped reads in at least 100 bp flanking region. I was then considering to divide the number of split/discordant reads with this number of properly mapped reads in 100 bp flanking.
Any idea if this would work? Please let me know if you have any suggestions.
Thanks in advance