Hi guys, if there are some ambiguous nucleotides appear in your sequences, does it affect the sequence analysis in general? Usually researchers will remove ambiguous nucleotides because they produce noise and affect analysis, is it correct?
Or are there any importance of ambiguous nucleotides that we shouldn't ignore?
Thank you very much for all the comments in advance :)
Please define
sequence analysis
.Sequence analysis here involved coronavirus genome sequence analysis. I did multiple sequence alignment of the human coronaviruses and I saw there is a lot of ambiguous nucleotides in the alignment. The MSA done as a step for comparative genome analysis to compare human CoVs genomes and find the effects of the differences (indels, substitution and conservation) identified within the alignment. I'm not sure whether the ambiguous nucleotides should be ignored or not because a lot of them are 'N' which are unknown and a small number of them are Y, R, W, S, K and M, which is hard to predict the specific nucleotide.
My plan is to ignore them for now. After I identified the differences within the alignment and begin to study the effect of the identified genome region, I will look back the genome region if ambiguous nucleotides are present in the identified genome region. Thank you :)