MultiGeneBlast gui vs command prompt
1
0
Entering edit mode
3.5 years ago

Hi Everyone!

MultiGeneBlast searches for homologous collections of genes between an input genbank file and a .pal database of multiple genbank files, made using the MultiGeneBlast makedb function. I am trying to run MultiGeneBlast v 1.1.13 (win32) from https://sourceforge.net/projects/multigeneblast/files/1.1.13/. I have Windows 10, 64-bit - pre-requisites for multigeneblast are windows 7 and below, so maybe the issues beow are being caused by my Windows 10 OS, but my computer tech friend said this shouldn't be the case.

I've made a local database using the multigeneblast makedb function, from a single genome on my computer, against with I am comparing antiSMASH v5 .gbk region files generated using this genome. This forms a good test case of the program, as I want it to find the genes in the antismash output that are the same in the genome.

However, when I try and use an input gb or gbk file (both of which are supported) and this database with the main multigeneblast call on the command line, I have been running into strange issues. For example, an error which doesn't seem to affect the output (which looks like what I would expect for the test cases I am using) is:

Running NCBI BLAST+ searches on GenBank database..
Command line argument error: Argument "query". File is not accessible:  `input.fasta'

Error running BLAST. Retrying...
Command line argument error: Argument "query". File is not accessible:  `input.fasta'

Error running BLAST. Retrying...
Command line argument error: Argument "query". File is not accessible:  `input.fasta'

Error running BLAST. Retrying...
Command line argument error: Argument "query". File is not accessible:  `input.fasta'

Error running BLAST. Retrying...
Error running Blast, exiting. Please check your system.

I've seen a few posts where this kind of error seems to be due to the fasta file being in incorrect directories, or there being permission issues. However, I've checked the permissions for the multigeneblast directory properties and it seems to be unrestricted, and its in my main python directory where I have manipulated files via scripts before. Regarding the directory where the fasta files are being made, I've been having a look in the python code and I think the fasta files are being made in the same directory as the script (i.e. the multigeneblast directory), and then being deleted at the very end of the program (i.e. after the errors above). So its essentially in the working directory and from what I can see the permissions for that directory look OK. The caveat here is that I am inferring this from the python script, but I am actually using the exe file which is a seperate file - I assume it will be dealing with file writing etc in the same way? Like I said it doesn't affect the output but mystery errors concern me.

Another error (which I think may be due to the database as the orf72 accession is not in the input file, but otherwise have no clue about) is:

Blast search finished. Parsing results...
{'input|c1|10476-11522|-|ctg3_orf02081|T1PKS|ctg3_orf02081|no_locus_tag': 348, 'input|c1|17144-17335|-|ctg3_orf02095|not_annotated|ctg3_orf02095|no_locus_tag': 63, 'input|c1|54942-55211|-|ctg3_orf02136|not_annotated|ctg3_orf02136|no_locus_tag': 89, 'input|c1|36493-36684|+|ctg3_orf02109|not_annotated|ctg3_orf02109|no_locus_tag': 63, 'input|c1|11754-12986|-|ctg3_orf02084|not_annotated|ctg3_orf02084|no_locus_tag': 410, 'input|c1|46379-46618|+|ctg3_orf02117|not_annotated|ctg3_orf02117|no_locus_tag': 79, 'input|c1|53322-53987|+|ctg3_orf02132|not_annotated|ctg3_orf02132|no_locus_tag': 221, 'input|c1|35552-36496|+|ctg3_orf02108|not_annotated|ctg3_orf02108|no_locus_tag': 314, 'input|c1|18884-19477|-|ctg3_orf02097|not_annotated|ctg3_orf02097|no_locus_tag': 197, 'input|c1|16354-17142|+|ctg3_orf02094|not_annotated|ctg3_orf02094|no_locus_tag': 262, 'input|c1|52315-53268|+|ctg3_orf02130|not_annotated|ctg3_orf02130|no_locus_tag': 317, 'input|c1|57383-58438|+|ctg3_orf02141|not_annotated|ctg3_orf02141|no_locus_tag': 351, 'input|c1|47682-48596|+|ctg3_orf02122|not_annotated|ctg3_orf02122|no_locus_tag': 304, 'input|c1|7224-7490|-|ctg3_orf02074|not_annotated|ctg3_orf02074|no_locus_tag': 88, 'input|c1|50462-51724|-|ctg3_orf02125|not_annotated|ctg3_orf02125|no_locus_tag': 420, 'input|c1|19937-23395|+|ctg3_orf02099|not_annotated|ctg3_orf02099|no_locus_tag': 1152, 'input|c1|45432-46109|-|ctg3_orf02116|not_annotated|ctg3_orf02116|no_locus_tag': 225, 'input|c1|1779-2576|+|ctg3_orf02067|not_annotated|ctg3_orf02067|no_locus_tag': 265, 'input|c1|39851-45361|+|ctg3_orf02113|not_annotated|ctg3_orf02113|no_locus_tag': 1836, 'input|c1|15428-16183|-|ctg3_orf02092|not_annotated|ctg3_orf02092|no_locus_tag': 251, 'input|c1|64314-64580|+|ctg3_orf02155|not_annotated|ctg3_orf02155|no_locus_tag': 88, 'input|c1|6073-7047|-|ctg3_orf02072|not_annotated|ctg3_orf02072|no_locus_tag': 324, 'input|c1|2573-3214|+|ctg3_orf02068|not_annotated|ctg3_orf02068|no_locus_tag': 213, 'input|c1|7117-7227|+|ctg3_orf02071|not_annotated|ctg3_orf02071|no_locus_tag': 36, 'input|c1|30019-35526|+|ctg3_orf02106|not_annotated|ctg3_orf02106|no_locus_tag': 1835, 'input|c1|7581-8399|+|ctg3_orf02073|not_annotated|ctg3_orf02073|no_locus_tag': 272, 'input|c1|14183-15328|+|ctg3_orf02089|not_annotated|ctg3_orf02089|no_locus_tag': 381, 'input|c1|17487-18863|+|ctg3_orf02096|not_annotated|ctg3_orf02096|no_locus_tag': 458, 'input|c1|51879-52205|+|ctg3_orf02127|not_annotated|ctg3_orf02127|no_locus_tag': 108, 'input|c1|55397-55807|+|ctg3_orf02137|not_annotated|ctg3_orf02137|no_locus_tag': 136, 'input|c1|63523-64275|+|ctg3_orf02154|not_annotated|ctg3_orf02154|no_locus_tag': 250, 'input|c1|54235-54849|-|ctg3_orf02133|not_annotated|ctg3_orf02133|no_locus_tag': 204, 'input|c1|56206-57390|+|ctg3_orf02139|not_annotated|ctg3_orf02139|no_locus_tag': 394, 'input|c1|1270-1599|-|ctg3_orf02065|not_annotated|ctg3_orf02065|no_locus_tag': 109, 'input|c1|62931-63521|+|ctg3_orf02152|not_annotated|ctg3_orf02152|no_locus_tag': 196, 'input|c1|61731-62744|-|ctg3_orf02149|not_annotated|ctg3_orf02149|no_locus_tag': 337, 'input|c1|13074-14165|+|ctg3_orf02087|not_annotated|ctg3_orf02087|no_locus_tag': 363, 'input|c1|36681-39854|+|ctg3_orf02111|not_annotated|ctg3_orf02111|no_locus_tag': 1057, 'input|c1|59644-61602|-|ctg3_orf02148|not_annotated|ctg3_orf02148|no_locus_tag': 652, 'input|c1|46800-47633|+|ctg3_orf02118|not_annotated|ctg3_orf02118|no_locus_tag': 277, 'input|c1|28040-30022|+|ctg3_orf02103|not_annotated|ctg3_orf02103|no_locus_tag': 660, 'input|c1|1-1206|+|ctg3_orf02062|NRPS|ctg3_orf02062|no_locus_tag': 401, 'input|c1|3443-6043|+|ctg3_orf02069|not_annotated|ctg3_orf02069|no_locus_tag': 866, 'input|c1|48733-50349|+|ctg3_orf02123|not_annotated|ctg3_orf02123|no_locus_tag': 538, 'input|c1|8450-9226|+|ctg3_orf02076|not_annotated|ctg3_orf02076|no_locus_tag': 258, 'input|c1|23392-27999|+|ctg3_orf02102|not_annotated|ctg3_orf02102|no_locus_tag': 1535, 'input|c1|9562-10380|+|ctg3_orf02078|not_annotated|ctg3_orf02078|no_locus_tag': 272, 'input|c1|58578-59597|+|ctg3_orf02143|not_annotated|ctg3_orf02143|no_locus_tag': 339}
Error: no sequence length found for input|c1|5001-6014|-|ctg3_orf00072|not_annotated|ctg3_orf00072|no_locus_tag

Now, the confusing thing is that when I use the exact same files, databases and settings immediately after this, using the GUI .exe, it works perfectly, with none of the Command line argument error: outputs. Furthermore, the command line call that was failing before also works, when the exact same files and settings are used (which wont work for what I am hoping to use this for unfortunately - I need it work from the command line straight away!). The Command line argument error: outputs are still there, but the output looks exactly like the gui output and the script completes to generate a valid output report file.

I could understand why its possible the GUI worked, as it has a separate script that potentially deals with files etc in a different way that does not generate bugs compared to the main multigeneblast exe. But why is it fixing the issues seen in the command line when calling the main multigeneblast exe?

MultiGeneBlast antiSMASH v5 • 1.5k views
ADD COMMENT
0
Entering edit mode

(reference) Marnix H. Medema, Eriko Takano, Rainer Breitling, Detecting Sequence Homology at the Gene Cluster Level with MultiGeneBlast, Molecular Biology and Evolution, Volume 30, Issue 5, May 2013, Pages 1218–1223, https://doi.org/10.1093/molbev/mst025

ADD REPLY
0
Entering edit mode

Do not add an answer unless you're answering the principal question. Use Add Comment to add related information. Or you can even edit your post and add the information in there. I've moved your post to a comment this time.

ADD REPLY
0
Entering edit mode

Cheers for the info!

ADD REPLY
0
Entering edit mode

Can you post the directory structure and the exact command you're using on the commandline?

ADD REPLY
1
Entering edit mode
3.5 years ago

Hi Joe, thanks for replying to this - looks like you're turning into by biostars guardian angel :P I had a chat with my supervisors (I'm a PhD student), who know the guy who made multigene blast, and they say it's broken so I've changed my plan for this.

ADD COMMENT
1
Entering edit mode

The code does work in my hands, but it is not the most well put together codebase so it took some real digging to get it working. If you can make do with other approaches I'd go for it!

ADD REPLY

Login before adding your answer.

Traffic: 1842 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6