Problem with script to automate codeml
1
2
Entering edit mode
6.6 years ago
ThulasiS ▴ 90

Dear All

I am trying to use simple for loop in shell script to execute bacth process of codeml to calculate dn/ds values for 1500 Orthologues but I am getting this error.

#!/bin/bash

for file in  *.aln
  do

`echo $file | sed 's/\.aln//'`

 python codemlScript.py /home/tulasi/Desktop/Tools/Tools/AutoPAML/Alignments/\.aln /home/tulasi/Desktop/Tools/Tools/AutoPAML/Trees/\.tree /home/tulasi/Desktop/Tools/Tools/AutoPAML/Out/\.out

done

But I am getting this out put with error

Saureus1202.aln
Saureus1202.aln
Saure_omega.out

25           kappa | kappa                  1.60
16           model | model                  0.00
CODONML in paml version 4.8, March 2014

----------------------------------------------
Phe F TTT | Ser S TCT | Tyr Y TAT | Cys C TGT
  TTC |       TCC |       TAC |       TGC
Leu L TTA |       TCA | *** * TAA | *** * TGA
  TTG |       TCG |       TAG | Trp W TGG
  ----------------------------------------------
Leu L CTT | Pro P CCT | His H CAT | Arg R CGT
  CTC |       CCC |       CAC |       CGC
  CTA |       CCA | Gln Q CAA |       CGA
  CTG |       CCG |       CAG |       CGG
----------------------------------------------
Ile I ATT | Thr T ACT | Asn N AAT | Ser S AGT
  ATC |       ACC |       AAC |       AGC
  ATA |       ACA | Lys K AAA | Arg R AGA 
Met M ATG |       ACG |       AAG |       AGG
 ----------------------------------------------
Val V GTT | Ala A GCT | Asp D GAT | Gly G GGT
  GTC |       GCC |       GAC |       GGC
  GTA |       GCA | Glu E GAA |       GGA
  GTG |       GCG |       GAG |       GGG
 ----------------------------------------------
Nice code, uuh?


Ambiguity character definition table:

T (1): T 
C (1): C 
A (1): A 
G (1): G 
U (1): T 
Y (2): T C 
R (2): A G 
M (2): C A 
K (2): T G 
S (2): C G 
W (2): T A 
H (3): T C A 
B (3): T C G 
V (3): C A G 
D (3): T A G 
- (4): T C A G 
N (4): T C A G 
? (4): T C A G 

processing fasta file
reading seq# 1 A27919                                                1359 sites
reading seq# 2 A15262                                                1359 sites
reading seq# 3 A23133                                                1359 sites
reading seq# 4 A10245                                                1359 sites
reading seq# 5 A2708                                                 1359 sites
 reading seq# 6 A12710                                                1359 sites
 reading seq# 7 A5063                                                 1359 sites
 reading seq# 8 A1                                                    1359 sites
 reading seq# 9 A7706                                                 1359 sites
 reading seq#10 A20825                                                1359 sites
  reading seq#11 A17684                                                1359 sites
  reading seq#12 A25250                                                1359 sites
  ns = 12   ls = 1359
  Reading sequences, sequential format..
  Reading seq #12: A25250     
  Sequences read..
  Counting site patterns..  0:00
      71 patterns at      453 /      453 sites (100.0%),  0:00
  Counting codons..
  NG distances for seqs.:
  1
  2   1:Sites   298.7 + 1060.3 = 1359.0 Diffs    11.0 +    1.0 =   12.0   2
  3   1:Sites   298.7 + 1060.3 = 1359.0 Diffs     7.0 +    1.0 =    8.0
  3   2:Sites   298.7 + 1060.3 = 1359.0 Diffs     4.0 +    0.0 =    4.0   3
  4   1:Sites   298.7 + 1060.3 = 1359.0 Diffs     7.0 +    1.0 =    8.0
  4   2:Sites   298.7 + 1060.3 = 1359.0 Diffs     4.0 +    0.0 =    4.0
  4   3:Sites   298.7 + 1060.3 = 1359.0 Diffs     0.0 +    0.0 =    0.0   4
  5   1:Sites   298.7 + 1060.3 = 1359.0 Diffs     7.0 +    1.0 =    8.0
  5   2:Sites   298.7 + 1060.3 = 1359.0 Diffs     4.0 +    0.0 =    4.0
  5   3:Sites   298.7 + 1060.3 = 1359.0 Diffs     0.0 +    0.0 =    0.0
  5   4:Sites   298.7 + 1060.3 = 1359.0 Diffs     0.0 +    0.0 =    0.0   5
  6   1:Sites   298.8 + 1060.2 = 1359.0 Diffs     0.0 +    0.0 =    0.0
  6   2:Sites   298.7 + 1060.3 = 1359.0 Diffs    11.0 +    1.0 =   12.0
  6   3:Sites   298.7 + 1060.3 = 1359.0 Diffs     7.0 +    1.0 =    8.0
  6   4:Sites   298.7 + 1060.3 = 1359.0 Diffs     7.0 +    1.0 =    8.0
  6   5:Sites   298.7 + 1060.3 = 1359.0 Diffs     7.0 +    1.0 =    8.0   6
  7   1:Sites   298.7 + 1060.3 = 1359.0 Diffs     4.0 +    1.0 =    5.0
  7   2:Sites   298.7 + 1060.3 = 1359.0 Diffs    11.0 +    0.0 =   11.0
  7   3:Sites   298.7 + 1060.3 = 1359.0 Diffs     7.0 +    0.0 =    7.0
  7   4:Sites   298.7 + 1060.3 = 1359.0 Diffs     7.0 +    0.0 =    7.0
  7   5:Sites   298.7 + 1060.3 = 1359.0 Diffs     7.0 +    0.0 =    7.0
  7   6:Sites   298.7 + 1060.3 = 1359.0 Diffs     4.0 +    1.0 =    5.0   7
  8   1:Sites   298.7 + 1060.3 = 1359.0 Diffs    10.0 +    1.0 =   11.0
  8   2:Sites   298.7 + 1060.3 = 1359.0 Diffs     3.0 +    0.0 =    3.0
  8   3:Sites   298.7 + 1060.3 = 1359.0 Diffs     3.0 +    0.0 =    3.0
  8   4:Sites   298.7 + 1060.3 = 1359.0 Diffs     3.0 +    0.0 =    3.0
  8   5:Sites   298.7 + 1060.3 = 1359.0 Diffs     3.0 +    0.0 =    3.0
  8   6:Sites   298.7 + 1060.3 = 1359.0 Diffs    10.0 +    1.0 =   11.0
  8   7:Sites   298.6 + 1060.4 = 1359.0 Diffs    10.0 +    0.0 =   10.0   8
  9   1:Sites   298.7 + 1060.3 = 1359.0 Diffs     3.0 +    1.0 =    4.0
  9   2:Sites   298.7 + 1060.3 = 1359.0 Diffs    10.0 +    0.0 =   10.0
  9   3:Sites   298.7 + 1060.3 = 1359.0 Diffs     6.0 +    0.0 =    6.0
  9   4:Sites   298.7 + 1060.3 = 1359.0 Diffs     6.0 +    0.0 =    6.0
  9   5:Sites   298.7 + 1060.3 = 1359.0 Diffs     6.0 +    0.0 =    6.0
  9   6:Sites   298.7 + 1060.3 = 1359.0 Diffs     3.0 +    1.0 =    4.0
 9   7:Sites   298.6 + 1060.4 = 1359.0  Diffs     1.0 +    0.0 =    1.0
 9   8:Sites   298.6 + 1060.4 = 1359.0  Diffs     9.0 +    0.0 =    9.0   9
10   1:Sites   298.7 + 1060.3 = 1359.0  Diffs     6.0 +    1.0 =    7.0
10   2:Sites   298.7 + 1060.3 = 1359.0  Diffs     7.0 +    0.0 =    7.0
10   3:Sites   298.7 + 1060.3 = 1359.0  Diffs     3.0 +    0.0 =    3.0
10   4:Sites   298.7 + 1060.3 = 1359.0  Diffs     3.0 +    0.0 =    3.0
10   5:Sites   298.7 + 1060.3 = 1359.0  Diffs     3.0 +    0.0 =    3.0
10   6:Sites   298.7 + 1060.3 = 1359.0  Diffs     6.0 +    1.0 =    7.0
10   7:Sites   298.7 + 1060.3 = 1359.0  Diffs     6.0 +    0.0 =    6.0
10   8:Sites   298.7 + 1060.3 = 1359.0  Diffs     6.0 +    0.0 =    6.0
10   9:Sites   298.7 + 1060.3 = 1359.0  Diffs     5.0 +    0.0 =    5.0  10
11   1:Sites   298.7 + 1060.3 = 1359.0  Diffs     8.0 +    2.0 =   10.0
11   2:Sites   298.7 + 1060.3 = 1359.0  Diffs    11.0 +    1.0 =   12.0
11   3:Sites   298.7 + 1060.3 = 1359.0  Diffs     7.0 +    1.0 =    8.0
11   4:Sites   298.7 + 1060.3 = 1359.0  Diffs     7.0 +    1.0 =    8.0
11   5:Sites   298.7 + 1060.3 = 1359.0  Diffs     7.0 +    1.0 =    8.0
11   6:Sites   298.7 + 1060.3 = 1359.0  Diffs     8.0 +    2.0 =   10.0
11   7:Sites   298.7 + 1060.3 = 1359.0  Diffs     6.0 +    1.0 =    7.0
11   8:Sites   298.7 + 1060.3 = 1359.0  Diffs    10.0 +    1.0 =   11.0
11   9:Sites   298.7 + 1060.3 = 1359.0  Diffs     5.0 +    1.0 =    6.0
11  10:Sites   298.7 + 1060.3 = 1359.0  Diffs     6.0 +    1.0 =    7.0  11
12   1:Sites   298.8 + 1060.2 = 1359.0  Diffs     0.0 +    0.0 =    0.0
12   2:Sites   298.7 + 1060.3 = 1359.0  Diffs    11.0 +    1.0 =   12.0
12   3:Sites   298.7 + 1060.3 = 1359.0  Diffs     7.0 +    1.0 =    8.0
12   4:Sites   298.7 + 1060.3 = 1359.0  Diffs     7.0 +    1.0 =    8.0
12   5:Sites   298.7 + 1060.3 = 1359.0  Diffs     7.0 +    1.0 =    8.0
 12   6:Sites   298.8 + 1060.2 = 1359.0 Diffs     0.0 +    0.0 =    0.0
12   7:Sites   298.7 + 1060.3 = 1359.0  Diffs     4.0 +    1.0 =    5.0
12   8:Sites   298.7 + 1060.3 = 1359.0  Diffs    10.0 +    1.0 =   11.0
 12   9:Sites   298.7 + 1060.3 = 1359.0 Diffs     3.0 +    1.0 =    4.0
12  10:Sites   298.7 + 1060.3 = 1359.0  Diffs     6.0 +    1.0 =    7.0
 12  11:Sites   298.7 + 1060.3 = 1359.0 Diffs     8.0 +    2.0 =   10.0  12

  528 bytes for distance
69296 bytes for conP
    0 bytes for fhK
 5000000 bytes for space
 . (Continuation of codeml output)
  .
   .   
 w =  0.02373 dN =  0.00000 dS =  0.00001 d4 =  0.00000 (189.6 four-fold sites)
         dN*=  0.00000 dS*=  0.00001 S* = 315.11 N* =1043.89
 end of tree file.

Time used:  0:10
Saureus1203.aln
Saureus1203.aln
 Saure_omega.out
 Traceback (most recent call last):
   File "codemlScript.py", line 45, in <module>
   results = cml.run(verbose=True)
    File "/usr/local/lib/python2.7/dist-packages/Bio/Phylo/PAML/codeml.py", line 185, in run
raise IOError("The specified tree file does not exist.")
  IOError: The specified tree file does not exist.

This is the pyhthon script I got from github

 #!/usr/bin/python

  ###################################################################################################################
#This script was written by Nathan Whelan.

# THIS SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS  
# OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF  MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
# THE CONTRIBUTORS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
# WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF
# OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS WITH THE
# SOFTWARE.
##########################################################################################################################

##This script utilizes codeml in PAML by Ziheng Yang. If you use this script please also cite PAML.

##BIOPYTHON IS REQUIRED FOR THIS SCRIPT!

##This script can be used to automate running the codeml PAML package(e.g. if you have hundreds of genes you want to fit to a model).
#A codeml ctl file should be in your working directory with the parameters you wish to use.

from __future__ import division
from Bio.Phylo.PAML import codeml ##Biopython PAML
import sys

#This program takes three inputs: the alignment in phylip format, a treefile in Newick format, and outputfile name
#It returns a nested dictionary with various results depending on analysis
if len(sys.argv) != 4:
print "Error.  There should be three inputs. An alignment in pyhlip format, a tree file, and the name for the     output file"
quit()

cml = codeml.Codeml()
cml.read_ctl_file("codeml.ctl") ##CTL File. See PAML manual for format.
cml.alignment = sys.argv[1]
cml.tree = sys.argv[2]
cml.out_file = sys.argv[3]

name=sys.argv[1]
print name
name=sys.argv[1]
print name
name2=name[0:5]+"omega.out"
print name2
results1 = cml.run(verbose=True)

I would like to know where I am doing mistake..

I guess now we need small modification in python script but i don't know where. Please try to solve

Thank you

python shell programming • 3.2k views
ADD COMMENT
0
Entering edit mode

Why are you referring to your files like:

/Trees/\.tree? You aren't passing the $file variable in to the python function? Is the tree file one of your inputs or one of the script outputs? (Sorry haven't read the code fully as its too early here :P )

Your script isn't finding the tree file, which tells you there's something wrong with how you're specifying the file path. Using the .tree (I would guess) is creating a hidden file called .tree

Just try:

for file in *.aln ; do python ....blah blah blah.... $file ; done

ADD REPLY
3
Entering edit mode
6.6 years ago

try

ls *.phy.aln |  sed 's/_DNA.phy.aln//' | while read NAME; do python codemlScript.py "${NAME}_DNA.phy.aln" "${NAME}_dropped.tre" "${NAME}_PAML_NeutralMethod1.out" ; done
ADD COMMENT
0
Entering edit mode

This was the error I got after your modification

Saureus1202.phy.aln_.aln Saureus1202.phy.aln_.aln Saure_omega.out Traceback (most recent call last): File "codemlScript.py", line 45, in <module> results = cml.run(verbose=True) File "/usr/local/lib/python2.7/dist-packages/Bio/Phylo/PAML/codeml.py", line 185, in run raise IOError("The specified tree file does not exist.") IOError: The specified tree file does not exist. Saureus1203.aln_.aln Saureus1203.aln_.aln Saure_omega.out Traceback (most recent call last): File "codemlScript.py", line 45, in <module> results = cml.run(verbose=True) File "/usr/local/lib/python2.7/dist-packages/Bio/Phylo/PAML/codeml.py", line 185, in run raise IOError("The specified tree file does not exist.") IOError: The specified tree file does not exist.

ADD REPLY
0
Entering edit mode

hum.. show me the output of

ls Saureus1203.*.phy.aln  | head

please

ADD REPLY
0
Entering edit mode

Hi Pierrie, now I modified question and description, I missed that '_' in script but still error is there. Sorry for disturbing you. Please check my query. Thank you

ADD REPLY
0
Entering edit mode

Output of ls *.phy. aln is

Saureus1202.phy.aln 
Saurues1203.phy.aln
ADD REPLY
2
Entering edit mode

there is no underscore in that list. So why are you using those underscores in your shell ? (...) sed 's/\_DNA.phy.aln//' (... ) python (...)_DNA.phy`

ADD REPLY
0
Entering edit mode

Oh.. yes.. I didn't notice that but even after modifying script also the calculation was going but ended with this error again Time used: 0:11 Saureus1203.aln Saureus1203.aln Saure_omega.out Traceback (most recent call last): File "codemlScript.py", line 45, in <module> results = cml.run(verbose=True) File "/usr/local/lib/python2.7/dist-packages/Bio/Phylo/PAML/codeml.py", line 185, in run raise IOError("The specified tree file does not exist.") IOError: The specified tree file does not exist.

ADD REPLY

Login before adding your answer.

Traffic: 3022 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6