I'm calling mitochondria variants with mutect2 and one variant looks like an artifact but I don't understand what could be the cause. It looks like from IGV (picture below) that this variant is always at the same position on forward and backward reads. Also the artifact might be caused the repeat sequence (see the image below).
Here is the vcf line of the variant:
chrM 1620 . A C . PASS AS_FilterStatus=SITE;AS_SB_TABLE=143,154|4,4;DP=307;ECNT=7;FS=0.000;MBQ=28,30;MFRL=0,0;MMQ=60,60;MPOS=43;OCM=0;POPAF=2.40;TLOD=17.31 GT:AD:AF:DP:F1R2:F2R1:FAD:PGT:PID:PS:SB 0|1:297,8:0.029:305:132,4:143,4:297,8:0|1:1620_A_C:1620:143,154,4,4
What do you think of this? Have you ever had this type of artifact ? How can I filter this out?
It might be related to soft-clipped bases (see image here here). Most of the reads with the variant also have soft-clipped bases at the end or start of the read. From what I understand there are three part in my reads supporting the variant:
- a region mapping to "position 1" in the genome
- a repeat region occurring in "position 1" and "position 2" of the genome (very similar except for the two variant bases)
- a region mapping to "position 2" in the genome
I don't know what to interpret now. Most of the reads bases map to "position 1" but I don't know why some also map well to "position 2"