Question: how to distinguish between artifacts and real indels?
I’m working with multiple sequences (fasta format) of the same human gene (exons 2,3,4). Each sequence is about 5300 nucleotides long. When I import the sequences into MEGA I can see that there are multiple “-“ deletions and “|” insertions in random places. These sequences came from a software that assigned the genotype allele to each sequence. When I view multiple sequences that presumable belong to the same genotype allele, I see that they don’t exactly align due to these “artifacts” .

Should I assume these are artifacts indels and remove them before doing the alignment? My goal is to find new variants outside the exon 2. How would I know these are variants and not artifacts?

Any advice will be greatly appreciated.


