Question: Parsing Blast Output Biopython Error
0
gravatar for Ankur
2.8 years ago by
Ankur40
Ankur40 wrote:

Hi, I have the following code

 def runBLAST(self):
        print "Running BLAST .........."
        cmd=subprocess.Popen("blastp -db nr -query repeat.txt -out out.faa -evalue 0.001 -gapopen 11 -gapextend 1 -matrix BLOSUM62 -remote -outfmt 5",shell=True)
        cmd.communicate()[0]
        f1=open("out.faa")
        blast_records = NCBIXML.parse(f1)
        save_file = open("my_fasta_seq.fasta", 'w')
        for blast_record in blast_records[:10]:
            for alignment in blast_record.alignments:
                for hsp in alignment.hsps:
                    save_file.write('>%s\n' % (alignment.hseq,))
        save_file.close()
        f1.close()
        f2=open("my_fasta_seq.fasta")
        for record in SeqIO.parse(f2,"fasta"):
            f=open("tempBLAST1.txt","w")
            f.write(">"+"\n"+strrecord.name)+"\n"+str(record.seq)+"\n")
            f.close()

I get the error on TypeError: for blastrecord in blastrecords[:10]: saying 'generator' object is not subscriptable. I am looking to get top 10 blast hits (sequences)

ADD COMMENTlink modified 2.8 years ago by Michael Kuhn4.7k • written 2.8 years ago by Ankur40
4
gravatar for Michael Kuhn
2.8 years ago by
Michael Kuhn4.7k
Dresden, Germany
Michael Kuhn4.7k wrote:

This is not a specific BioPython problem, but a general Python question, answered e.g. on StackOverflow. It might be that BioPython only parses the next result on demand, in this case you might be better off with:

for i, blast_record in enumerate(blast_records):
    if i == 10: break
    ...
ADD COMMENTlink written 2.8 years ago by Michael Kuhn4.7k
1

It's also a follow-up to the previous question: http://biostar.stackexchange.com/questions/9880/getting-top-10-sequences-of-blast-results-bio-python and perhaps should have continued there instead. It's fine to edit your questions and discuss answers in the comments, rather than starting a new question for every variation of the same problem.

ADD REPLYlink written 2.8 years ago by Neilfws41k

As Michael says, blast_records is a generator/iterator. You can loop over it or iterate explicitly by calling next(), but you cannot access records by index. This is a general design pattern for coping with very large files composed of multiple smaller records, also used in the the Biopython SeqIO parse function etc.

ADD REPLYlink written 2.8 years ago by Peter3.8k
Please log in to add an answer.

Help
Access
  • RSS
  • Stats
  • API

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.0.0
Traffic: 404 users visited in the last hour