I'm looking to run a number of Fasta files simultaneously using local blast for a pipeline. Im using biopython to read in my input file and parse through for a specified number of sequences e.g. 1000 if the file is larger i batch it out into segments of 1000. But i'm now looking for a way be able to run each file through local blast rather than one at a time and then concatenating all the output files i receive for post-blast parsing by E-value.
def batch_iterator(iterator, batch_size):#generator function for splitting large files entry = True # Make sure we loop once while entry: batch =  while len(batch) < batch_size: try: entry = iterator.__next__() except StopIteration: entry = None if entry is None: # End of file break batch.append(entry) if batch: yield batch counter =0 for record in SeqIO.parse(Input_file,'fasta'):# pasrse input file for sequence # counter +=1 if counter > 10:#if input file has more than 10 seqs file is batched record_iter=SeqIO.parse(open(Input_file),"fasta") for i, batch in enumerate(batch_iterator(record_iter, 10)): filename = "batch_%i.fasta" % (i + 1) with open(filename, "w") as handle: count = SeqIO.write(batch, handle, "fasta") print("Write %i records to %s" % (count, filename)) else: continue
What would be the best way to automate this so i grab all my batch files and run them through local blast? Would i have to use
./blastp -db a_database -query queryfile.fasta -out blastoutpu.tsv -outfmt 6 for individual file name using the os(command) in my script or is there a simpler way?