Getting Top 10 Sequences Of Blast Results Bio Python
2
3
Entering edit mode
12.8 years ago
Ankur ▴ 40

Hi,

I want to get top 10 sequences of BLAST results (just the sequences, no alignment or score or e-value etc). I am inputting a text file containing 5 fasta file. So my output should be top 10 blast hits of each fasta file.. therefore my output file will have 50 sequences.

I am reading each of my input fasta file through Bio.SeqIO, writing it as temp.faa and then passing it to command line BLAST through subprocess as

blastp -db nr -query temp.faa -out out.faa -evalue 0.001 -gapopen 11 -gapextend 1 -matrix BLOSUM62 -remote -outfmt 2

the output has lots of other information. Should I parse this output now or there's a better way.

Thanks

P.S XML might be the way, but I didn't find a relavant NCBIXML parser syntax

ncbi blast biopython python sequence • 13k views
ADD COMMENT
6
Entering edit mode
12.8 years ago
Peter 6.0k

An alternative to filtering the top 10 hits in a script after running BLAST is to ask BLAST to do this for you (using argument -maxtargetseqs). This way the output file will be smaller and the BLAST job should be faster to run.

Also, if you don't like XML and really just want the sequence, specify tabular output with just the matched sequence (field name sseq).

ADD COMMENT
0
Entering edit mode

Makes for much easier/quicker parsing as well in order to grab the full sequences from the relevant sequence file.

ADD REPLY
3
Entering edit mode
12.8 years ago
Hranjeev ★ 1.5k

This is a good guide in doing what you want.

You may want to parse like this for getting the sequence.


    from Bio.Blast import NCBIXM

    blast_records = NCBIXML.parse(blast_handle)

    save_file = open("/path/to/file/my_fasta_seq.fasta", 'w')

    for i,blast_record in enumerate(blast_records):
      if i==10: break
      for alignment in blast_record.alignments:
          for hsp in alignment.hsps:
                save_file.write('>%s\n' % (hsp.query))
    save_file.close()

Update: fixed some errors

ADD COMMENT
0
Entering edit mode

fixed an error; should work now

ADD REPLY
0
Entering edit mode

@HRanjeev for a multi-fasta file wont this simply loop through the first 10 10 records, taking ALL of the hits for each record?

I think you want the enumerate/break in the alignment part.

ADD REPLY

Login before adding your answer.

Traffic: 2941 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6