Passing multiple fasta files to local blast
0
0
Entering edit mode
6.0 years ago

I'm looking to run a number of Fasta files simultaneously using local blast for a pipeline. Im using biopython to read in my input file and parse through for a specified number of sequences e.g. 1000 if the file is larger i batch it out into segments of 1000. But i'm now looking for a way be able to run each file through local blast rather than one at a time and then concatenating all the output files i receive for post-blast parsing by E-value.

def batch_iterator(iterator, batch_size):#generator function for splitting large files
    entry = True  # Make sure we loop once
    while entry:
        batch = []
        while len(batch) < batch_size:
            try:
                entry = iterator.__next__()
            except StopIteration:
                entry = None
            if entry is None:
                # End of file
                break
            batch.append(entry)
        if batch:
            yield batch 

counter =0
for record in SeqIO.parse(Input_file,'fasta'):# pasrse input file for sequence #
    counter +=1
    if counter > 10:#if input file has more than 10 seqs file is batched  
        record_iter=SeqIO.parse(open(Input_file),"fasta")
        for i, batch in enumerate(batch_iterator(record_iter, 10)):
            filename = "batch_%i.fasta" % (i + 1)
            with open(filename, "w") as handle:
                count = SeqIO.write(batch, handle, "fasta")
            print("Write %i records to %s" % (count, filename))      
    else:
        continue

What would be the best way to automate this so i grab all my batch files and run them through local blast? Would i have to use ./blastp -db a_database -query queryfile.fasta -out blastoutpu.tsv -outfmt 6 for individual file name using the os(command) in my script or is there a simpler way?

biopython alignment Blast BlastP python • 2.4k views
ADD COMMENT
1
Entering edit mode

Is there any specific reason you want to batch this analysis (eg. run it in parallel on a compute cluster) ? otherwise it will be more efficient to run the blast with one big input file

ADD REPLY
0
Entering edit mode
ADD REPLY
0
Entering edit mode

That link is actually where i started my query, i'm just wondering if there's a way to do it in python rather than bash.

ADD REPLY
0
Entering edit mode
ADD REPLY
0
Entering edit mode

You could use the multiprocessing python module.

Alternatively, use subprocess to pass blast commands to GNU Parallel at the commandline. How you batch up the files before invoking either of these would be entirely up to you in the python script.

ADD REPLY

Login before adding your answer.

Traffic: 1848 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6