Here's the relevant part of my code:
query = raw_input("Enter query (fasta)\n")
database2 = raw_input("Enter organism 2 and genome name (fasta) in the format 'organism, genome'\n")
organism2 = database2[0:database1.find(",")]
genomehandle2 = database2[database1.find(",")+2:]
blast2 = NcbiblastnCommandline(query= str(query) + ".fasta", db=str(genomehandle2)+".fa", evalue=1e-10, outfmt=7, out="blast" + str(query) + str(organism2) + ".txt", num_threads=4)
stdout, stderr = blast2()
This works normally for any organism which has a name with 5 characters or fewer. However it's not that there's a limit on the total output handle length, because if I remove the "+str(query)+" it STILL tries to cut the organism name down to 5 characters, and if I keep the query which is way larger than 5 characters in there this still works with any organism name given that it is 5 characters or smaller.
"Enter organism 2 and genome name (fasta) in the format 'organism, genome'
macaq, macfas5" -> This works and starts BLAST
"Enter organism 2 and genome name (fasta) in the format 'organism, genome'
macaque, macfas5" -> This fails and returns the error:
"Traceback (most recent call last):
File "pipeline.py", line 30, in <module>
stdout, stderr = blast2()
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/Bio/Application/__init__.py", line 523, in __call__
stdout_str, stderr_str)
Bio.Application.ApplicationError: Non-zero return code 1 from 'blastn -out blastmacaq.txt -outfmt 7 -query HERV-K113.fasta -db , macfas5.fa -evalue 1e-10 -num_threads 4', message 'USAGE'"
I give all the input at the start and it never asks again. I don't get what your point is here.
Actually I've given my input in the post but someone went and modified my post so it's given in code format for whatever reason.
"macaq, macfas5" works as input
"macaque, macfas5" does not work
Thanks for identifying the issue. I'm gonna try to fix that bit now.
The issue was this specific part in my script:
The script begins by BLASTing two different databases. So far my first database has always been for the human genome. "human" happens to begin with 5 characters. When I find "," in "human, hg38" I get a value that is smaller than the position of the comma in "macaque, macfas5". Instead of taking "macfas5", it was beginning 2 characters further back than it should and taking ", macfas5" because it was looking at the length of "human" instead of the length of "macaque".
I still don't agree with your comments on raw_input() though. I might be new to this and it was the first way I found out how to assign terminal input to variables, but I give all the parameters at the start of the script and then it carries out the whole thing for half an hour without any further questions. The fact that I specifically used raw_input() isn't the source of the issue. Someone just reformatted my post for no reason and you somehow skipped over the part you wanted for troubleshooting. If I used parameters in the commandline I would have entered them into the terminal. When I use raw_input(), I still enter everything into the terminal. If I copy what I see in the terminal and paste it here, you get the same information regardless of what I used except when I use parameters you have no idea what they're referring to unless I tell you.
Ther problem with using
raw_input
is that it does not preserve the input that you put in. So it may not cause the problem but it hides the data from you.Nobody else knows what you have typed in there - you may think you know but it is easy to mistype it.
It is much simpler to list it at the command line that way the whole thing is explicit. Isn't it a lot easier to rerun something as
then use the arrow key to recall the previous command rather than having to retype.
What do you mean it doesn't preserve it? It's visible in the terminal long after it's entered and it's saved into the log my script generates. I don't see how it's hidden. Here's a screenshot of me entering the first few parameters into the terminal using raw_input():
The benefit of running it the way you describe is that I can enter everything in one line which is just one step faster than if I put all my variables into a single raw_input() question and split the input by " ", but the whole point of having it separated out is so that it makes sense and acts more like a menu rather than a command where the parameters and their order need to be memorised. This script requires 7 parameters right now. Is it really that user-friendly to have them all required when you're calling the script? And you say "retype", but the entire thing takes 20-30min to run, I'm not exactly repeating it over and over and I only need to enter these parameters once at the start.
And the reason I'm probably defensive about this is that I don't really understand how I can get Python to recognise the parameters I enter along with the script filename. If I ran pipeline.py param1 param2 param3 in the terminal, what function would I use to retrieve those parameters in Python? I think you're assuming the parameters are only used by the commandline BLAST search. This script does more than just run two local BLAST searches, it also has to retrieve flanks from each hit, reBLAST the flanks, align two sets of flanks against each other and some other stuff. The parameters need to be used in Python. The reason I did not give this context or the entire script is because I thought it wasn't necessary to fix the issue specific to the initial BLAST searches, and it really wasn't.
the command line parameters will be stored in the module attribute
sys.argv
as a listthen
will print
Thanks, that's kinda nice. I'll rewrite it for the next time I run the script.
The next step is to use the argparse library that will allow you to write nice programs that take named parameters like so:
and so on. At that point your program's usage becomes self-documenting.
https://docs.python.org/3/library/argparse.html