Tutorial:Easy way to run easily orthoMCL (Copy & paste)
4
15
Entering edit mode
8.4 years ago
Esaie ▴ 170

After spent a lot of time to run the 13 step of orthoMCL I reached running it successfully.

The difficulty for me it's not orthoMCL but, to know the usage of each command because the documentation it's not clear enough.

Here I present the details of commands with parameter that I have used according to the OrthoMCL User's Guide http://orthomcl.org/common/downloads/software/v2.0/UserGuide.txt.

(1) I using MySql, You need first to create a database name : orthomcl this is my orthomcl.config.template file

 # this config assumes a mysql database named 'orthomcl'.  adjust according
 #to your situation. 

dbVendor=mysql 
dbConnectString=dbi:mysql:orthomcl 
dbLogin=your_login
dbPassword=your_password
similarSequencesTable=SimilarSequencesorthomcl
orthologTable=Orthologorthomcl
inParalogTable=InParalogorthomcl
coOrthologTable=CoOrthologorthomcl
interTaxonMatchView=InterTaxonMatchorthomcl
percentMatchCutoff=50
evalueExponentCutoff=-5
oracleIndexTblSpc=NONE

(2) et (3) obvious

(4) in the command line switch to the root of orthomcl file, and run this command, it will create tables in your orthomcl database

orthomclSoftware-v2.0.9$ bin/orthomclInstallSchema my_orthomcl_dir/orthomcl.config.template my_orthomcl_dir/install_schema.log

(5) according to orthomcl guide switch to

orthomclSoftware-v2.0.9$ cd my_orthomcl_dir/compliantFasta/

and after cow.fasta enter image description here

orthomclSoftware-v2.0.9/my_orthomcl_dir/compliantFasta$ ../../bin/orthomclAdjustFasta cow /home/....../cow.fasta 1
orthomclSoftware-v2.0.9/my_orthomcl_dir/compliantFasta$ ../../bin/orthomclAdjustFasta hsa /home/....../human.fasta 1
orthomclSoftware-v2.0.9/my_orthomcl_dir/compliantFasta$ ../../bin/orthomclAdjustFasta mus /home/....../mouse.fasta 1

These commandes will create 3 files in compliantFasta directory name cow.fasta, hsa.fasta ans mus.fasta

(6)

orthomclSoftware-v2.0.9/my_orthomcl_dir/compliantFasta$ ../../bin/orthomclFilterFasta . 10 20

(7) Step 7: All-v-all BLAST

You need first to create a local database in file for blast

orthomclSoftware-v2.0.9/my_orthomcl_dir/compliantFasta$  makeblastdb -in goodProteins.fasta -dbtype prot -out my_prot_blast_db

and

orthomclSoftware-v2.0.9/my_orthomcl_dir/compliantFasta$ blastall -p blastp -F 'm S' -v 100000 -b 100000 -z 55 -e 1e-5 -d my_prot_blast_db -i goodProteins.fasta -o out.tab -m 8

(8) In my own case, I have create a directory name 'blast' compliantFasta directory in copy cow.fasta, hsa.fasta ans mus.fasta in this diectory by doing

orthomclSoftware-v2.0.9/my_orthomcl_dir/compliantFasta$ mkdir blast

and

orthomclSoftware-v2.0.9/my_orthomcl_dir/compliantFasta$ cp cow.fasta blast
orthomclSoftware-v2.0.9/my_orthomcl_dir/compliantFasta$ cp hsa.fasta blast
orthomclSoftware-v2.0.9/my_orthomcl_dir/compliantFasta$ cp mus.fasta blast

and

orthomclSoftware-v2.0.9/my_orthomcl_dir/compliantFasta$ ../../bin/orthomclBlastParser out.tab blast/ >> similarSequences.txt

(9)

orthomclSoftware-v2.0.9/my_orthomcl_dir/compliantFasta$ ../../bin/orthomclLoadBlast ../orthomcl.config.template similarSequences.txt

(10)

orthomclSoftware-v2.0.9/my_orthomcl_dir/compliantFasta$ ../../bin/orthomclPairs ../orthomcl.config.template ../orthomcl_pairs.log cleanup=no

(11)

orthomclSoftware-v2.0.9/my_orthomcl_dir/compliantFasta$ ../../bin/orthomclDumpPairsFiles ../orthomcl.config.template

(12)

orthomclSoftware-v2.0.9/my_orthomcl_dir/compliantFasta$ mcl mclInput --abc -I 1.5 -o mclOutput

(13)

orthomclSoftware/my_orthomcl_dir/compliantFasta$ ../../bin/orthomclMclToGroups cluster 1000 < mclOutput > groups.txt

This is what I have done to execute successful.

I hope that will help you.

for those who like me love copy and paste

if you like please just vote, like or comment.

all-v-all orthomcl • 19k views
ADD COMMENT
0
Entering edit mode

Hi Durelle. I would like to run orthoMCL too but I am quite confuse with the analysis itself. I am still new to this that's why there were some things that I am confuse a bit. Below are my questions and the error I went through when I tried running the orthoMCL.

1.) DATA. I'm not really sure about the input data in the pre-proccessing stage (orthomclAdjustFasta). I've read in one of the article that the input file is a protein sequence in a fasta format. What I have is my de novo assembled transcriptome. Can I use that in the orthoMCL? I would like to know the orthologs actually. Or, do I need to run Transdecoder then convert the .pep into fasta format and used that as my input file?

2.) I tried using my de novo transcriptome in the orthomclAdjustFasta but i met with an error while running the analysis. And if I can actually used my transcriptomic data, then it would be nice if you can shed some light with this error.

My assembled transcriptome contains >300,000 transcripts. A few of the first transcripts are:

TR1|c0_g2_i1 len=410 path=[443:0-227 459:228-257 457:258-287 448:288-311 449:312-409] [-1, 443, 459, 457, 448, 449, -2

TR2|c0_g1_i1 len=270 path=[292:0-92 293:93-112 294:113-269] [-1, 292, 293, 294, -2]

TR3|c0_g2_i1 len=386 path=[436:0-290 445:291-307 443:308-308 441:309-347 427:348-385] [-1, 436, 445, 443, 441, 427, -2]

TR3|c1_g1_i1 len=254 path=[232:0-253] [-1, 232, -2]

When I did the orthomclAdjustFasta i met with an error stating that TR3 is repeated (or something like that. Please go back to the example above) and thus terminates the run. I am using the ssh command and that's what I got in my .err output.

If I can used my data, what should I do to solve this error? I am thinking of renaming those repeated transcripts but I am afraid if I would jeopardize the result. Would that infer with the blast run? What analysis can you refer to me to edit those repeated transctips as fast as I could? Renaming it individually is very tedious.

ADD REPLY
0
Entering edit mode

Hello stephravelo7!

I don't know if you have already fixed your problem. You must filter your .pep (protein) file and keep only unique (e.g. longest) isoforms in a fasta format style. File extension doesn't matter as long as it's correctly formatted. That is:

>c0_g2_i1
AAABBBCCC
>c0_g1_i1
AAABBBCCC
>c1_g1_i1
AAABBBCCC
...

That's what you should use as input for orthomclAdjustFasta.

Hope it can help :-)

ADD REPLY
0
Entering edit mode

Hi,

I am following the instructions and my analysis is running well until in step 9.

orthomclLoadBlast
$ orthomclLoadBlast orthomcl.config similarSequences.txt

Error: DBD::mysql::st execute failed: Data too long for column 'SUBJECT_ID' at row 1567
ADD REPLY
0
Entering edit mode

Hi Santiago,

Thank for the answer. Anyway, i ran with this error in orthomclDumpPairsFIles:

DBD::mysql::st execute failed: Error writing file '/tmp/MYXqNMk7' (Errcode: 28 - No space left on device) at /home/.../orthomclDumpPairsFiles line 54, <F> line 14.

Do you have any suggestion on how to solve this? Thanks.

ADD REPLY
0
Entering edit mode

This tutorial was really useful for me! Thanks!

ADD REPLY
0
Entering edit mode

Great. Thanks for posting your experience

ADD REPLY
0
Entering edit mode

Hi, thanks for posting the streamlined help.

I was able to run everything up to step 8 and then, after submitting my command for step 9 I got the following error:

DBD::mysql::st execute failed: Row 1 doesn't contain data for all columns at /usr/local/bin/orthomclLoadBlast line 39, <F> line 13.

I am not very familiar with SWL, can someone please hep me figuring out what is wrong?

My command line was :

/usr/local/bin/orthomclLoadBlast /usr/local/archive/orthomcl/orthomclSoftware-v2.0.9/bin/orthomclInstallSchema.config similarSequences.txt

And my config scheme looks like:

dbVendor=mysql
dbConnectString=dbi:mysql:orthomcl:localhost:3307
dbLogin=orthomcl
dbPassword=orthomcl
similarSequencesTable=SimilarSequences
orthologTable=Ortholog
inParalogTable=InParalog
coOrthologTable=CoOrtholog
interTaxonMatchView=InterTaxonMatch
percentMatchCutoff=0
evalueExponentCutoff=0
oracleIndexTblSpc=what
blastResultsTable=BlastResults
ADD REPLY
0
Entering edit mode

Please use the formatting bar (especially the code option) to present your post better. I've done it for you this time.
code_formatting

ADD REPLY
0
Entering edit mode

can you post a head of your similarSequences.txt ? It seems that you are missing some fields in that file? Perhaps also add the blast command you execute in one of the previous steps.

ADD REPLY
0
Entering edit mode

Thanks a lot!, I could spot the error by looking at the SimilarSequences.txt files, I had a wrong header ( several limes of "processing XX genome" just before the columns start). That was because I used nohup to redirect output. Now my SimilarSequences head table looks like:

Atra|XP_013197910.1     Atra|XP_013197910.1     Atra    Atra    2       -77     100     100
Atra|XP_013197910.1     Prap|XP_022124259.1     Atra    Prap    6       -56     100     68.5
Atra|XP_013197910.1     Dple|OWR42513.1 Atra    Dple    6       -56     100     68.5
Atra|XP_013197910.1     Prap|XP_022124260.1     Atra    Prap    8       -56     100     68.5
Atra|XP_013197910.1     Prap|XP_022124258.1     Atra    Prap    8       -56     100     68.5
Atra|XP_013197910.1     Prap|XP_022124256.1     Atra    Prap    8       -56     100     68.5
Atra|XP_013197910.1     Bmor|XP_021203058.1     Atra    Bmor    8       -56     100     68.5
Atra|XP_013197910.1     Prap|XP_022124257.1     Atra    Prap    8       -56     100     68.5
Atra|XP_013197910.1     Bmor|XP_021203057.1     Atra    Bmor    1       -55     100     68.5
Atra|XP_013197910.1     Vtam|XP_026498474.1     Atra    Vtam    1       -55     98.9    68.5

Everything worked fine now but at the end only 2 of the 6 genomes I added are in the final tables (Only Atra and Prap). My config file looks like this now, could that be the issue?

dbVendor=mysql
dbConnectString=dbi:mysql:orthomcl:localhost:3307
dbLogin=orthomcl
dbPassword=orthomcl
similarSequencesTable=SimilarSequences
orthologTable=Ortholog
inParalogTable=InParalog
coOrthologTable=CoOrtholog
interTaxonMatchView=InterTaxonMatch
percentMatchCutoff=50
evalueExponentCutoff=-5
oracleIndexTblSpc=NONE
blastResultsTable=BlastResults
/usr/local/bin/orthomclInstallSchema.config (END)

many thanks again!.

ADD REPLY
0
Entering edit mode

I am also stucked in step 8. mysql Problem DBD::mysql::st execute failed: Loading local data is disabled; this must be enabled on both the client and server sides at ../orthomclLoadBlast line 39, <F> line 9.

ADD REPLY
0
Entering edit mode

this sounds like a 'configuration' issue on your mysql server. Most likely you will need to talk to your local sys-admin to open this up for you (== such that you are able & allowed to submit data to the DBs)

ADD REPLY
0
Entering edit mode

Could you please let me know if you have fixed this problem? I have the same issue here. Thank you!

ADD REPLY
0
Entering edit mode

Thank you very much! It was very useful and easy to follow

ADD REPLY
0
Entering edit mode
6.6 years ago
hudak7 • 0

When i run orthomclLoadBlast it says my SimilarSequences table is full. Any idea what these means? I have tons of space in my disk space.

ADD COMMENT
0
Entering edit mode

You can start a new question, these have a higher likelihood of getting answered.

Sounds like you still have a previous run that finished orthomclLoadBlast in the MySQL database. If you're sure that's your data then you can skip that step, I usually end up DROPping the whole database before running all OrthoMCL steps, like this, where the mysql username is root:

mysql -u root -p

DROP DATABASE orthomcl;

CREATE DATABASE orthomcl;

Then running everything from the start.

ADD REPLY
0
Entering edit mode

Hi,

I am following the instructions and my analysis is running well until in step 9.

orthomclLoadBlast

$ orthomclLoadBlast orthomcl.config similarSequences.txt

Error: DBD::mysql::st execute failed: Data too long for column 'SUBJECT_ID' at row 1567
ADD REPLY
0
Entering edit mode

The name of one or more genes are too long for the database!

Two ways to fix this:

  • write a small script (using Bopython or Bioperl) that cuts off the names of your IDs to below 60 characters

  • in the OrthoMCL script orthomclInstallSchema, change these two lines:

    QUERY_ID                 VARCHAR(60),
    SUBJECT_ID               VARCHAR(60),
    

to something like

 QUERY_ID                 VARCHAR(120),
 SUBJECT_ID               VARCHAR(120),

and rerun the script (you will probably encounter other database columns which are too short)

ADD REPLY
0
Entering edit mode

After checking my similarSequences.txt, I think the problem is not with the no. of characters in the InstallSchema script but with the output generated in blastall following the command above. The no. of database size should be specified in -z command. In the script above, -z in blastall is 55 because that's the database size of its goodProteins.fasta. For you to know the db siz of your own data, I use this command: grep ">" goodProteins.fasta | wc. The first number will give you the database size, which you will need when you run blastall.

ADD REPLY
0
Entering edit mode

Hi Philipp,

Thanks for the help from my previous question. I meet again with another error in orthomclDumpPairsFiles.

DBD::mysql::st execute failed: Error writing file '/tmp/MYza1d1c' (Errcode: 28 - No space left on device) at /home/.../bin/orthomclDumpPairsFiles line 54, <F> line 14.

Do you have any idea how to solve this? Thanks!

ADD REPLY
0
Entering edit mode

Your hard-drive is full, delete unnecessary stuff :)

ADD REPLY
0
Entering edit mode

This is going to be a lame question but how do i check the space of my hd? Anyway, I cant delete files as of the moment so I am wondering if I can change the writing of the file in other folder instead of /tmp/MYza1d1c? I had this folder with 613G, so I use this folder by writing its path in the tmpdir of mysqld.cnf. Apparently, when I restart mysql to take effect the change I made, I got an error:

Job for mysql.service failed because the control process exited with error code. See "systemctl status mysql.service" and "journalctl -xe" for details.

I am really new to this so I am sorry if I'm asking too many questions.

ADD REPLY
0
Entering edit mode

You can use

df -h

to see how much space you have left on each device.

There's a chance that the mysql process is not allowed to write to the folder with 613G, so look at the rights of that folder using ls -lah

ADD REPLY
0
Entering edit mode

I am having problem at the step 8. I am getting this message over and over again.

DBD::mysql::st execute failed: The used command is not allowed with this MySQL version at .......orthomclSoftware-v2.0.9/bin/orthomclLoadBlast line 39, <f> line 14.

My command was:

...bin/orthomclLoadBlast .....orthomclSoftware-v2.0.9/my_orthomcl_dir/orthomcl.config.template ....orthomclSoftware-v2.0.9/my_orthomcl_dir/compliantFasta/similarSequences

I have installed mysql-8.0.11-macos10.13-x86_64.dmg in MacOS High Sierra 10.13.6

I have tried to add:

dbConnectString=dbi:mysql:orthomcl:mysql_local_infile=1

to my orthomcl.config.template file. Still not working.

Any idea how to solve this?

Thanks.

ADD REPLY
0
Entering edit mode

Hi, it's kind off late but it means to me that there's incompatibility between the mysql and orthomcl you used. See in this link (C: OrthoMCL installation on Ubuntu Linux) the orthomcl installation that works from above command. Good luck!

ADD REPLY
0
Entering edit mode

Hi Philipp,

any idea why one would get such error message on step#4 (orthomclInstallSchema)

orthomclInstallSchema my_orthomcl_dir/orthomcl.config my_orthomcl_dir/install_schema.log

Can't locate DBD/mysql.pm in @INC (you may need to install the DBD::mysql module)

here is my orthomcl.config file:

dbVendor=mysql 
dbConnectString=dbi:mysql:mmax_orthomcl
dbLogin=mmax
dbPassword=PASSWORD
similarSequencesTable=SimilarSequences
orthologTable=Ortholog
inParalogTable=InParalog
coOrthologTable=CoOrtholog
interTaxonMatchView=InterTaxonMatch
percentMatchCutoff=50
evalueExponentCutoff=-3
oracleIndexTblSpc=NONE

I tried manually installing DBD::mysql using code below, and I get another error about Devel::CheckLib

[DBD-mysql-4.050]$ perl Makefile.PL

Can't locate Devel/CheckLib.pm in @INC (you may need to install the Devel::CheckLib module)

Do you have any idea how to solve this? thank you.

ADD REPLY
1
Entering edit mode

That's a Perl error - you need to install Devel::CheckLib and DBD::mysql in your Perl installation, using either cpanm or cpan - have a look at the OrthoMCL manual, the required Perl libraries should be listed there

ADD REPLY
0
Entering edit mode

Thank you Philipp! may I ask two more questions? when I use makeblastdb in step 7, I don't get a resulting .fasta file as part of the output, only a .psq, .phr, and .pin file. Not sure if this is normal as I see in the above steps they got a .fasta file.

Also how does one tell blastp to perform an ' all vs all ' blastp analysis? "blastall" is not recognized for me

thanks for any input!

ADD REPLY
2
Entering edit mode

That's the resulting output of blastdb. It just make the database for you. After that, you need to run it in blastp. The blastall is included in the old blast version. The new one do not really have the blastall command. I think it's okay to just run blastp. Well, yeah.

ADD REPLY
0
Entering edit mode

That makes sense. thank you for the reply!

ADD REPLY
0
Entering edit mode
6.2 years ago
ashaneev07 ▴ 40

Question: orthomcl error: DBD::mysql::st execute failed: The table 'SimilarSequences' is full.....

Hi... When I use orthomclLoadBlast, I got an error message. Anyone can give me suggestions? Thanks.

home@home-Lenovo-H30-50:~/orthomclSoftware-v2.0.9/myOrthoMCL/ComplaintFasta$ orthomclLoadBlast /etc/orthomcl.config similarSequences.txt
DBD::mysql::st execute failed: The table 'SimilarSequences' is full at /usr/local/bin/orthomclLoadBlast line 39, <F> line 14.
ADD COMMENT
0
Entering edit mode

is it the first time you (try to) run this? if not you first might need to empty the existing one (as the procedure will fail it the table exists and is not empty).

ADD REPLY
0
Entering edit mode

No. This is the second run. So, I've to remove the orthomcl database. Right?

ADD REPLY
0
Entering edit mode

Not the complete database, just empty it (or even just that specific table)

ADD REPLY
0
Entering edit mode

No. Do not delete the orthomcl database; you just need to flash the existing table from your first ran and then reconnect again to mysql. The instruction should be in the orthomcl manual.

ADD REPLY
0
Entering edit mode

you can of course drop the whole database but then you will need to re-create it (and thus go one step back in the whole orthomcl process), but as stated before, normally no need to do this

ADD REPLY
0
Entering edit mode

Thanks for the reply..

I've drop the orthomcl database and recreate the same.Then it ran smoothly. But, again showing errors when coming to the next step.

home@home-Lenovo-H30-50:~/orthomclSoftware-v2.0.9/myOrthoMCL/ComplaintFasta$ orthomclPairs /etc/orthomcl.config orthomcl_pairs.log cleanup=no

DBD::mysql::st execute failed: The total number of locks exceeds the lock table size at /usr/local/bin/orthomclPairs line 709, <F> line 14.
ADD REPLY
0
Entering edit mode
4.4 years ago
hypeanut • 0

Hi, I am at step 7: $ blastall -p blastp -F 'm S' -v 100000 -b 100000 -z 55 -e 1e-5 -d my_prot_blast_db -i goodProteins.fasta -o out.tab -m 8 After I ran this, terminal reported 'Warning: [blastp] The parameter -num_descriptions is ignored for output formats > 4 . Use -max_target_seqs to control output' Could someone please tell me if I should anything in the code? Or just run without changing anything? Thank you!

ADD COMMENT
0
Entering edit mode

this is due to implementations of the newer versions of blast. In theory you can ignore this (blast will automagically change it for you). But better practise is to change that command line to use the correct parameter.

so change the -v 100000 -b 100000 part to -max_target_seqs 100000

ADD REPLY
0
Entering edit mode
4.0 years ago
Buffo ★ 2.4k

Hi, I have installed orthoMCL using conda, anybody knows how to set/get login and password (of mysql)?

ADD COMMENT

Login before adding your answer.

Traffic: 1151 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6