Functional Annottations From Blast.
2
2
Entering edit mode
12.1 years ago
Mnowotka ▴ 20

Hello, I'm writing a program that takes sequences in some format (FASTA for example) performs a BLAST search on it and takes a funcional annotation of each result, then looks in Gene Ontology database to get the standard representation of these annotations. The problem is that BLAST returns only similar sequences not the annotations of this sequences. What I get is only a description and accession number (like AK333026). I don't know what do do with this number. Look up some database? OK, but I think this is not a single database I have to look up. So having accesion number how can I determine, which database should I get? Also, I'm doing this using biopython, and I'm not qute sure if it provides interface for looking up all those DBs. Any hints?

biopython blast • 2.9k views
ADD COMMENT
1
Entering edit mode

What do you want to do? An alternate to blast2GO?

ADD REPLY
2
Entering edit mode
12.1 years ago

The identifiers are from NCBI's database, which you query using Entrez. Biopython provides an interface to retrieve records from Entrez, with full documentation in the Tutorial. To query and retrieve the GenBank record for your example:

from Bio import Entrez
from Bio import SeqIO

Entrez.email = "test@example.com"
accession = "AK333026"
handle = Entrez.esearch(db="nucleotide", term=accession)
rec = Entrez.read(handle)
search_id = rec["IdList"][0]

handle = Entrez.efetch(db="nucleotide", id=search_id, rettype="gb",
                       retmode="text")
rec = SeqIO.read(handle, "genbank")
print rec

You can use the resulting Biopython SeqRecord to retrieve the information you're interested in.

ADD COMMENT
2
Entering edit mode
12.1 years ago

Related ?: I wrote a java program that displays the genbank annotations for the BLAST Hit and the BLAST Query. See my post: http://plindenbaum.blogspot.fr/2010/11/blastxmlannotations.html

QUERY: Homo sapiens eukaryotic translation initiation factor 4 gamma, 1 (EIF4G1), transcript variant 2, mRNA
       ID:gi|303227906|ref|NM_198241.2| Len:5538
>Mus musculus eukaryotic translation initiation factor 4, gamma 1 (Eif4g1), transcript variant 2, mRNA
 NM_001005331
 id:gi|56699433|ref|NM_001005331.1| len:5460

   e-value:0 gap:138 bitScore:6818.02

                #####:############################################ exon 1..180 gene:EIF4G1 
QUERY 000000053 GGCGCCGGCTGCGCCTGCGGAGAAGCGGTGGCCGCCGAGCGGGATCTGTG 000000102
                ||||| ||||||||||||||||||||||||||||||||||||||||||||
HIT   000000001 GGCGCTGGCTGCGCCTGCGGAGAAGCGGTGGCCGCCGAGCGGGATCTGTG 000000050
                #####:############################################ exon 1..128 gene:Eif4g1

                ################################################## exon 1..180 gene:EIF4G1 
QUERY 000000103 CGGGGAGCCGGAAATGGTTGTGGACTACGTCTGTGCGGCTGCGTGGGGCT 000000152
                ||||||||||||||||||||||||||||||||||||||||||||||||||
HIT   000000051 CGGGGAGCCGGAAATGGTTGTGGACTACGTCTGTGCGGCTGCGTGGGGCT 000000100
                ################################################## exon 1..128 gene:Eif4g1

                ############::::::::::######                       exon 1..180 gene:EIF4G1 
                                            #:::::::::::::###::::: exon 181..237 gene:EIF4G1 
QUERY 000000153 CGGCCGCGCGGACTGAAGGAGACTGAAGGCCCTCGGATGCCCAGAACCTG 000000202
                ||||||||||||          |||||||             |||     
HIT   000000101 CGGCCGCGCGGA----------CTGAAGG-------------AGA----- 000000122
                ############----------#######-------------###      gene 1..5460 gene:Eif4g1 
                ############----------#######-------------###      exon 1..128 gene:Eif4g1

                ::::::::::::::::::::::##:##:::::::#                exon 181..237 gene:EIF4G1 
                                                   ############### exon 238..331 gene:EIF4G1 
QUERY 000000203 TAGGCCGCACCGTGGACTTGTTCTTAATCGAGGGGGTGCTGGGGGGACCC 000000252
                                      || ||       ||||||||||||||||
HIT   000000123 ----------------------CTGAA-------GGTGCTGGGGGGACCC 000000143
                ----------------------##:##-------#                exon 1..128 gene:Eif4g1 
                                                   ############### exon 129..222 gene:Eif4g1

                #:###############################:###:############ exon 238..331 gene:EIF4G1 
                                   ##############:###:############ CDS 272..5071 gene:EIF4G1 
QUERY 000000253 TGATGTGGCACCAAATGAAATGAACAAAGCTCCACAGTCCACAGGCCCCC 000000302
                | ||||||||||||||||||||||||||||||| ||| ||||||||||||
HIT   000000144 TAATGTGGCACCAAATGAAATGAACAAAGCTCCCCAGCCCACAGGCCCCC 000000193
                #:###############################:###:############ exon 129..222 gene:Eif4g1 
                                   ##############:###:############ CDS 163..4944 gene:Eif4g1

 (...)

                ############:#:#:#####:######:#:########:##:###### exon 4890..5521 gene:EIF4G1 
                ############:#:#:#####:######:#:########:##:###### STS 4948..5505 gene:EIF4G1 
                ############:#:#:#####:######:#:########:##:###### STS 5174..5403 gene:EIF4G1 
QUERY 000005319 TTGGTGTGTCTTGGGGTGGGGAGGGGCACCAACGCCTGCCCCTGGGGTCC 000005368
                |||||||||||| | | ||||| |||||| | |||||||| || ||||||
HIT   000005201 TTGGTGTGTCTTTGCGGGGGGAAGGGCACTACCGCCTGCCTCTAGGGTCC 000005250
                ############:#:#:#####:######:#:########:##:###### exon 4760..5396 gene:Eif4g1

                ::##############:##########:###################### exon 4890..5521 gene:EIF4G1 
                ::##############:##########:###################### STS 4948..5505 gene:EIF4G1 
                ::##############:##########:#######                STS 5174..5403 gene:EIF4G1 
QUERY 000005369 TTTTTTTTATTTTCTGAAAATCACTCTCGGGACTGCCGTCCTCGCTGCTG 000005418
                  |||||||||||||| |||||||||| ||||||||||||||||||||||
HIT   000005251 --TTTTTTATTTTCTG-AAATCACTCTTGGGACTGCCGTCCTCGCTGCTG 000005297
                --##############-##########:###################### exon 4760..5396 gene:Eif4g1

                ######################:#############:############# exon 4890..5521 gene:EIF4G1 
                ######################:#############:############# STS 4948..5505 gene:EIF4G1 
QUERY 000005419 GGGGCATATGCCCCAGCCCCTGTACCACCCCTGCTGTTGCCTGGGCAGGG 000005468
                |||||||||||||||||||||| ||||||||||||| |||||||||||||
HIT   000005298 GGGGCATATGCCCCAGCCCCTGCACCACCCCTGCTGCTGCCTGGGCAGGG 000005347
                ######################:#############:############# exon 4760..5396 gene:Eif4g1

                #:##-############################################: exon 4890..5521 gene:EIF4G1 
                #:##-#################################             STS 4948..5505 gene:EIF4G1 
                                            ######                 polyA_signal 5496..5501 gene:EIF4G1 
                                                                #  polyA_site 5516 gene:EIF4G1 
QUERY 000005469 GGAA-GGGGGGGCACGGTGCCTGTAATTATTAAACATGAATTCAATTAAG 000005517
                | || |||||||||||||||||||||||||||||||||||||||||||| 
HIT   000005348 GAAAGGGGGGGGCACGGTGCCTGTAATTATTAAACATGAATTCAATTAAA 000005397
                #:##:############################################  exon 4760..5396 gene:Eif4g1

                :::#                  exon 4890..5521 gene:EIF4G1 
                   #                  polyA_site 5521 gene:EIF4G1 
QUERY 000005518 CTCAAAAAAAAAAAAAAAAAA 000005538
                   ||||||||||||||||||
HIT   000005398 AAAAAAAAAAAAAAAAAAAAA 000005418
ADD COMMENT

Login before adding your answer.

Traffic: 2725 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6