I have a large number of complete genomes (downloaded from NCBI) related to the same bacterial species. As it is mentioned in the guidelines of MUSCLE, I have already used Usearch for clustering (Uclust) and divided my data to different gene clusters. I have used MEGA software on Windows. But there are two problems in gene cluster of my interest:
- gene sequences are not of equal length in the same cluster.
- a few gene sequences in that cluster have an end codon in the start or mid of the sequences. (due to non-synonymous mutations)
If I remove those sequences with end codons in the start or mid, actually I would be removing those genomes from my study and I don't want that. Is it possible to solve this issue without excluding those genomes?
Is it possible for MUSCLE/Usearch to replace the gaps or end codons with other alphabets just to equal the aligned sequences?
What I need:
- MUSCLE should finish the alignment and results should not be disturbed for that gene cluster due to above-mentioned problems.
- the resulting alignments should be of equal length.
PS: I have tried MUSCLE in MEGA 7 software. I have not tried the command line version. If a solution exists in the command line version, I can try that as well.