I have a FASTA file containing ~500 DNA sequences of a specific gene collected from various yeast strains. Each sequence is labeled with the corresponding strain name in the header. I would like to identify mutational variants—both SNPs and indels—across these sequences. My goal is to annotate these variants at both the cDNA and protein levels.
While identifying SNPs and translating them into protein variants seems relatively straightforward, handling indels has proven challenging, particularly when determining their correct impact on the translated peptide sequence due to potential frameshifts.
I’m currently working in Python but am also open to R-based solutions. I would appreciate any recommendations for existing tools, workflows, or published scripts/tutorials that are designed for this use case—especially those that can correctly manage indels and their effect on protein sequences.
Thank you very much for your time and help.