Question: Passing multiple fasta files to local blast
0
gravatar for New2programming
14 months ago by
New2programming0 wrote:

I'm looking to run a number of Fasta files simultaneously using local blast for a pipeline. Im using biopython to read in my input file and parse through for a specified number of sequences e.g. 1000 if the file is larger i batch it out into segments of 1000. But i'm now looking for a way be able to run each file through local blast rather than one at a time and then concatenating all the output files i receive for post-blast parsing by E-value.

def batch_iterator(iterator, batch_size):#generator function for splitting large files
    entry = True  # Make sure we loop once
    while entry:
        batch = []
        while len(batch) < batch_size:
            try:
                entry = iterator.__next__()
            except StopIteration:
                entry = None
            if entry is None:
                # End of file
                break
            batch.append(entry)
        if batch:
            yield batch 

counter =0
for record in SeqIO.parse(Input_file,'fasta'):# pasrse input file for sequence #
    counter +=1
    if counter > 10:#if input file has more than 10 seqs file is batched  
        record_iter=SeqIO.parse(open(Input_file),"fasta")
        for i, batch in enumerate(batch_iterator(record_iter, 10)):
            filename = "batch_%i.fasta" % (i + 1)
            with open(filename, "w") as handle:
                count = SeqIO.write(batch, handle, "fasta")
            print("Write %i records to %s" % (count, filename))      
    else:
        continue

What would be the best way to automate this so i grab all my batch files and run them through local blast? Would i have to use ./blastp -db a_database -query queryfile.fasta -out blastoutpu.tsv -outfmt 6 for individual file name using the os(command) in my script or is there a simpler way?

ADD COMMENTlink written 14 months ago by New2programming0
1

Is there any specific reason you want to batch this analysis (eg. run it in parallel on a compute cluster) ? otherwise it will be more efficient to run the blast with one big input file

ADD REPLYlink written 14 months ago by lieven.sterck5.4k

I found this if that could help : https://gif.biotech.iastate.edu/running-blast-jobs-parallel

ADD REPLYlink written 14 months ago by Bastien Hervé4.3k

That link is actually where i started my query, i'm just wondering if there's a way to do it in python rather than bash.

ADD REPLYlink written 14 months ago by New2programming0

From 2008 (could be out of age) : http://bpbio.blogspot.fr/2008/02/parallel-blasts-using-pythons-pp-module.html

ADD REPLYlink written 14 months ago by Bastien Hervé4.3k

You could use the multiprocessing python module.

Alternatively, use subprocess to pass blast commands to GNU Parallel at the commandline. How you batch up the files before invoking either of these would be entirely up to you in the python script.

ADD REPLYlink written 14 months ago by jrj.healey13k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1760 users visited in the last hour