Hello,
the MACSE_V2 toolkit provides several tools to deal with nuceotide coding sequences. The alignSequences subprogram of MACSE allows building reliable codon alignments even in the presence of frameshifts of stop codons (especially useful for dN/dS analysis and pseudogene analysis). Morevover, this subprogam can handle the fact that different sequences use different genetic codes. MACSE also includes a subprogram specifically designed to replace stop codons (and frameshift codons) from an alignment. This subprogam (exportAlignment) allows to specify the codon (three letters of your choice) that will replace the stop codons. You can even provide two different codons for replacing stops appearing within the sequence (unexpected unless in pseudogenes) and stop codons appearing at the end of the sequences. While there is several options (e.g. to specify the output file name and the genetic code to use) the basic usage is quite straightforward:
java -jar macse.jar -prog exportAlignment -align align.fasta -codonForFinalStop --- -codonForInternalStop NNN
To ease the alignment of coding nucleotide sequences, we also provide ready to use alignment pipelines (provided as singularity container), which include optional filtering steps. These pipelines output the (filtered) nucleotide alignment, the corresponding (filtered) amino acid ones and the detail of the filtering steps (if some filtering steps were selected).
what aligner would you like to use? Most, if not all, have a command line. Deleting the stop codons afterwards should be "trivial". E.g. biopython has an interface for most aligners and will run PAML as well. However, please keep in mind that dN/dS calculations are (obviously) very dependent on a good alignment. The huge downside of this automated approach will be that you will likely not quality check each alignment before moving on.
Hello, I have a similar problem?In big data,I must delete the stop codons in the sequence.So,can you give me some suggestions?
Thanks!