Question

Getting Top 10 Sequences Of Blast Results Bio Python

3

Entering edit mode

12.8 years ago

Ankur ▴ 40

Hi,

I want to get top 10 sequences of BLAST results (just the sequences, no alignment or score or e-value etc). I am inputting a text file containing 5 fasta file. So my output should be top 10 blast hits of each fasta file.. therefore my output file will have 50 sequences.

I am reading each of my input fasta file through Bio.SeqIO, writing it as temp.faa and then passing it to command line BLAST through subprocess as

blastp -db nr -query temp.faa -out out.faa -evalue 0.001 -gapopen 11 -gapextend 1 -matrix BLOSUM62 -remote -outfmt 2

the output has lots of other information. Should I parse this output now or there's a better way.

Thanks

P.S XML might be the way, but I didn't find a relavant NCBIXML parser syntax

ncbi blast biopython python sequence • 13k views

ADD COMMENT • link updated 12.8 years ago by Peter 6.0k • written 12.8 years ago by Ankur ▴ 40

score 6 · Answer 1 · 2011-07-05

6

Entering edit mode

12.8 years ago

Peter 6.0k

An alternative to filtering the top 10 hits in a script after running BLAST is to ask BLAST to do this for you (using argument -maxtargetseqs). This way the output file will be smaller and the BLAST job should be faster to run.

Also, if you don't like XML and really just want the sequence, specify tabular output with just the matched sequence (field name sseq).

ADD COMMENT • link 12.8 years ago by Peter 6.0k

0

Entering edit mode

Makes for much easier/quicker parsing as well in order to grab the full sequences from the relevant sequence file.

ADD REPLY • link 12.8 years ago by DG 7.3k

score 3 · Answer 2 · 2011-07-05

3

Entering edit mode

12.8 years ago

Hranjeev ★ 1.5k

This is a good guide in doing what you want.

You may want to parse like this for getting the sequence.


    from Bio.Blast import NCBIXM

    blast_records = NCBIXML.parse(blast_handle)

    save_file = open("/path/to/file/my_fasta_seq.fasta", 'w')

    for i,blast_record in enumerate(blast_records):
      if i==10: break
      for alignment in blast_record.alignments:
          for hsp in alignment.hsps:
                save_file.write('>%s\n' % (hsp.query))
    save_file.close()

Update: fixed some errors

ADD COMMENT • link 12.8 years ago by Hranjeev ★ 1.5k

0

Entering edit mode

fixed an error; should work now

ADD REPLY • link 12.8 years ago by Hranjeev ★ 1.5k

0

Entering edit mode

@HRanjeev for a multi-fasta file wont this simply loop through the first 10 10 records, taking ALL of the hits for each record?

I think you want the enumerate/break in the alignment part.

ADD REPLY • link 12.3 years ago by Zach Powers ▴ 340