How To Do Alignment, Stop Codon Removal And Dn/Ds Calulation In One Go?
3
8
Entering edit mode
9.5 years ago
Naren ▴ 950

I have over 1000 files each having 30 sequences. Manually aligning, removing stop codons and then calculating avarage dN/dS for each file is impossible for me.
Are there ways to perform this via command drive.
(I know PAML, but no tool known for aligning in paml format and for removing stop codons)
Even 3 different tools for each step will do, the thing is just that I should be able to do it from command prompt.

(I'm on Win7)

paml • 13k views
1
Entering edit mode

what aligner would you like to use? Most, if not all, have a command line. Deleting the stop codons afterwards should be "trivial". E.g. biopython has an interface for most aligners and will run PAML as well. However, please keep in mind that dN/dS calculations are (obviously) very dependent on a good alignment. The huge downside of this automated approach will be that you will likely not quality check each alignment before moving on.

0
Entering edit mode

Hello， I have a similar problem?In big data,I must delete the stop codons in the sequence.So,can you give me some suggestions?

Thanks!

7
Entering edit mode
2.7 years ago

Hello,

the MACSE_V2 toolkit provides several tools to deal with nuceotide coding sequences. The alignSequences subprogram of MACSE allows building reliable codon alignments even in the presence of frameshifts of stop codons (especially useful for dN/dS analysis and pseudogene analysis). Morevover, this subprogam can handle the fact that different sequences use different genetic codes. MACSE also includes a subprogram specifically designed to replace stop codons (and frameshift codons) from an alignment. This subprogam (exportAlignment) allows to specify the codon (three letters of your choice) that will replace the stop codons. You can even provide two different codons for replacing stops appearing within the sequence (unexpected unless in pseudogenes) and stop codons appearing at the end of the sequences. While there is several options (e.g. to specify the output file name and the genetic code to use) the basic usage is quite straightforward:

java -jar macse.jar -prog exportAlignment -align align.fasta -codonForFinalStop --- -codonForInternalStop NNN


To ease the alignment of coding nucleotide sequences, we also provide ready to use alignment pipelines (provided as singularity container), which include optional filtering steps. These pipelines output the (filtered) nucleotide alignment, the corresponding (filtered) amino acid ones and the detail of the filtering steps (if some filtering steps were selected).

0
Entering edit mode

I tried the above command -

java.lang.StringIndexOutOfBoundsException: String index out of range: 1002

How do I solve this error ?

2
Entering edit mode

You have to use the nucleotide (CDS) alignment file as input.

6
Entering edit mode
9.5 years ago

Hi,

Few time ago i got the same problem. I solved using a perl script available here (. Since is needed to feed with both nuclotides and amino acids i have used t-coffee to translate.

This have worked fine for me. I have done in linux, for windows you may neeed to write a .bat file to it easily. Take a look on bat files tutorial for syntax if you are not familiar with that. I think that will work.

You can use fasta format as sequence file for PAML no need of .pml format.

Regards,

Joao

0
Entering edit mode

Thanks so much!!

0
Entering edit mode

I am working dn/ds for 4431 gene clusters, i tried to remove stopcodons by pal2nal program. but for some reason around 198 cultures still has the stop codons.

now i have tried to remove the stopcodons with the above script but i don't get any error and stop codons are not removed.

any suggestions thank you

0
Entering edit mode

I tried with dummy data set it worked but in the actual data set it not working when i run to calculate dn/ds

0
Entering edit mode

Hi, I'm also facing the same problem, unable to get rid off the STOP CODONS from my dataset.

5
Entering edit mode
9.5 years ago
SES 8.5k

Pal2Nal will generate a codon alignment without stop codons, given a MSA of proteins and the corresponding DNA sequences. If the input is a pairwise alignment, I believe it will calculate dN/dS ratios (using PAML) for you automatically. Otherwise, you can just input your alignments to PAML to calculate dN/dS. Pal2Nal is written in Perl, so it should work on your Win7 machine (I don't know about PAML though, unless there is a Windows version available).

1
Entering edit mode

should the nucleotide alignment be trimmed? And also the protein alignment be trimmed?