9.4 years ago by
Santiago de Compostela, Spain
I see here a slight misconception of the term "synonymous SNP", so I think this should be clarified in advance. the strict meaning of this term is of a SNP that changes a base on the DNA sequence which does not affect the amino acid translation, so the protein stays the being same (definition #1). but I think you are using it in a slightly different way, trying to define a SNP that generates the same kind of amino acid change on all your organisms (definition #2). I will try to address both cases.
If you are looking for non-synonymous SNPs in their strict meaning (definition #1) you need to translate each DNA sequence into its coded protein considering the SNPs' alleles, and looking at the amino acid changes get the SNPs that affected such protein translation. this should be done for each sequence separately, since the strict definition doesn't depend on anything else.
But if you are looking for SNPs that create different protein sequences on different organisms then you will need to perform the alignments you mention. choosing whether to align DNA or protein sequence shouldn't matter, as long as your translating script takes into account the results of the alignment, but if you are asking for advice I see aligning protein sequences and check for mismatches on them a little bit easier. but of course the SNP position wouldn't necessarily have to be amino acid position multiplied by 3, as it could be placed on any of the 3 positions of the triplet, so you should store that information too in order to reverse position such SNP.
EDIT: I tried to address 2 different ideas I thought you could have in mind, but I was always considering that you were talking about SNPs. after reading Michael's comment on your question I realize that you are maybe not intending to deal with SNPs at all but with sequence mismatches. if that is the case then I would suggest you to translate your DNA sequences into proteins, align both DNA and protein sequences (DNA with DNA and protein with protein, of course), and check whether DNA mismatches correspond to protein mismatches. you may do the position check by evaluating the following expression:
if ( $proteinPosition == int ( ($dnaPosition-1) / 3 ) + 1 )