Question: Problems With Orthomclblastparser Script
0
gravatar for RT
7.2 years ago by
RT330
European Union
RT330 wrote:

Hi All,

I was trying to find true orthologs for a set of sequences using OrthoMCL program. I made it upto step 8- orthoMCLBlastParser. I provided my blast output in -m8 format. When I ran the orthoMCLBlastParser it asks for the taxonID of the subject sequences. I modified my blast output file by providing an id to subject sequences like 'xxx|YYYYYY'. But still getting the same error.

Can someone help me for this.

Thanks, R.

I am just copying a few lines from my blast output file and error given by the orthoMCLBlastParser.

BlastParser:

BLASTP 2.2.26+

Query: ppp|scf8123

Database: aPD3R_pep

Fields: query id, subject id, % identity, alignment length, mismatches, gap opens, q. start, q. end, s. start, s. end, evalue, bit score

100 hits found

ppp|scf8123 xxx|Pd3R61150.1|PACid:197844 56.00 800 239 15 9 804 13 703 0.0 824

ppp|scf8123 xxx|Pd3R61150.1|PACid:197366 95.88 826 32 2 1 824 1 826 0.0 1557

Error:

acquiring genes from ppp.fasta couldn't find taxon for gene 'xxx|Pd3R61150.1|PACid:197844' at /Downloads/orthomclSoftware-v2.0.2/bin/orthomclBlastParser line 103, line 1.

Please note that I removed the first 5 lines of the output file otherwise it gives me the error: couldn't find taxon for gene 'BLASTP' at /Downloads/orthomclSoftware-v2.0.2/bin/orthomclBlastParser line 103, line 1.

fasta orthomcl conversion • 3.0k views
ADD COMMENTlink modified 7.2 years ago by jollymrt10 • written 7.2 years ago by RT330
0
gravatar for Vitis
7.2 years ago by
Vitis2.1k
New York
Vitis2.1k wrote:

I think you should keep only 'xxx|Pd3R61150.1' of 'xxx|Pd3R61150.1|PACid:197844' for the query proteins. If the manual says the naming convention should in this form, I think they mean it, strictly. At least, I followed these and it worked.

ADD COMMENTlink written 7.2 years ago by Vitis2.1k
0
gravatar for SES
7.2 years ago by
SES8.2k
Vancouver, BC
SES8.2k wrote:

Are you also giving the orthoMCLBlastParser the path to the directory of "Adjusted" Fasta files (as defined by orthoMCL) as the second argument? That directory should contain "ppp.fasta" and "xxx.fasta" and any additional taxa that you are analyzing. I agree with vitis that you should probably follow the directions explicitly because modifying the headers will likely break the parser or possibly introduce some other unintended effect downstream.

ADD COMMENTlink written 7.2 years ago by SES8.2k

Thanks a lot Vitis and Ses. This is working now. I managed to make it work till step 9- orthomclBlastParser where it loads the blast results into the database. When I run the next step for generating the potential Ortholog, inparalog and coorthologs, it results me the empty tables in the database :( . On running the next script OrthomclDumpPairs, pairs directory has three empty files and mclInput file is also empty. Any ideas?

ADD REPLYlink written 7.2 years ago by RT330

i have the same problem, I am having data in the intermediate tables but not Ortholog,Paralog or CoOrtholog tables

ADD REPLYlink written 7.1 years ago by jollymrt10

I managed to run this program and completed all the steps. For this problem, I was messing up with my database and there was some memory problem on my system (hard to remember the exact problem right now). But I took time and went through all the steps again and again (tried to follow each and every minor point given in the manual). It took me two day but finally did it myself. Try to follow the manual (every detail), if u still cant do I will be able to help u.

ADD REPLYlink written 7.1 years ago by RT330

I have met the same problem. only obtained empty mclInput file. How did you resolve it? I did not run the all-V-all blast. Just did blast of my sequences with the reference proteome. Does it matter? Thanks

ADD REPLYlink written 5.5 years ago by binlu19810
0
gravatar for RT
7.2 years ago by
RT330
European Union
RT330 wrote:

Thanks a lot Vitis and Ses. This is working now. I managed to make it work till step 9- orthomclBlastParser where it loads the blast results into the database. When I run the next step for generating the potential Ortholog, inparalog and coorthologs, it results me the empty tables in the database :( . On running the next script OrthomclDumpPairs, pairs directory has three empty files and mclInput file is also empty. Any ideas?

Just to add a little bit of information, I have 30 sequences for which I am interested to look for the orthologs in one particular species. So I did not run the all-vs-all blast. I just did the blast of my sequences with the proteome of another species and got the results in the same format -m 8. This time I carefully followed all the naming conventions provided in the manual. Is it something to do with the All-vs-All blast?

There is no chance that my sequences does not have orthologs in the another species. Please help.

Many thanks once again.

ADD COMMENTlink written 7.2 years ago by RT330
0
gravatar for jollymrt
7.0 years ago by
jollymrt10
jollymrt10 wrote:

thanks for the help ritu but now i am getting populated InParalog table but empty ortholog tables. Any idea why that is happening.

ADD COMMENTlink written 7.0 years ago by jollymrt10

I have met the same problem. only obtained empty mclInput file. How did you resolve it? I did not run the all-V-all blast. Just did blast of my sequences with the reference proteome. Does it matter? Thanks

ADD REPLYlink written 5.5 years ago by binlu19810
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2107 users visited in the last hour