Biopython Error To Extract Annotations['Taxonomy'] Information From Biosql Entries.
2
0
Entering edit mode
10.8 years ago
flyzhy.org • 0

When I try to extract annotations['taxonomy'] information from BioSQL entries, the taxonomy information (record.annotations['taxonomy']) was error to show the organism name of the entry, but not the exact taxonomy information. Is there anything wrong of my operation or the bug of BioPython? Thanks for any reply.

For example: BioSQL - bioentry_qualifier_value

bioentry_id term_id     value                  rank 
     1     3     Yellowtail ascites virus    0
    1     4     Viruses    1
    1     4     dsRNA viruses    2
    1     4     Birnaviridae    3
    1     4     Aquabirnavirus    4

BioPython:

>>> records[0].annotations['taxonomy']
['Yellowtail ascites virus']
>>> records[0].annotations
{'ncbi_taxid': 59816L, 'data_file_division': 'VRL', 'source': ['Yellowtail ascites virus'], 'references': [Reference(title='cDNA cloning of yellowtail ascites virus segment A and expression of epitope region on VP2', ...), Reference(title='Direct Submission', ...)], 'sequence_version': ['1'], 'keywords': [''], 'taxonomy': ['Yellowtail ascites virus'], 'date': ['25-SEP-1997'], 'organism': 'Yellowtail ascites virus', 'gi': '2073114', 'accessions': ['AB003359']}
>>>
biopython • 2.9k views
ADD COMMENT
0
Entering edit mode

Note BioSQL typically stores a copy of the NCBI taxonomy and this overrides any lineage given in the original GenBank file.

Where is the error? Your question is not clear to me.

Did you pre-load the NBCI taxonomy into the BioSQL database? Or, were you online so that Biopython could pull the lineage from the NCBI Entrez API? How did you populate this BioSQL database (e.g. using Biopython or BioPerl)?

ADD REPLY
0
Entering edit mode

Thanks for your reply. I think it's the problem of NCBI taxonomy. I have imported the BioSQL data from a Genbank file using Biopython without loading the NCBI taxonomy. And when Biopython extracts the data from BioSQL, the taxonomy information is extracted from the "taxon" and "taxon_name" table but not imported "bioentry_qualifier_value" table, which means that I need to load NCBI taxonomy first. However, I want to know why Biopython extract taxonomy information from NCBI taxonomy but not the bioentry qualifiers? Thanks.

ADD REPLY
0
Entering edit mode
10.8 years ago
flyzhy.org • 0

Also I found that the taxonomy information extracted from biosql database (NCBI taxonomy) is different from the original taxonomy information imported to biosql database.

For example, I used Biopython to import the gbk file into biosql with the taxonomy information of bioentry:

  ORGANISM  Yellowtail ascites virus
            Viruses; dsRNA viruses; Birnaviridae; Aquabirnavirus.

After imported the genbank file, I got the taxonomy information from biosql database bioentry_qualifier_value table:

bioentry_id term_id     value                  rank 
     1     3     Yellowtail ascites virus    0
    1     4     Viruses    1
    1     4     dsRNA viruses    2
    1     4     Birnaviridae    3
    1     4     Aquabirnavirus    4

Then I used Biopython to extract the taxonomy information of this bioentry with NCBI taxonomy loaded:

  ORGANISM  Yellowtail ascites virus
            Viruses; Birnaviridae; Aquabirnavirus; Yellowtail ascites virus.

which means that the "dsRNA viruses" information was missed.

What I want to know is if Biopython extracts the taxonomy information from NCBI taxonomy tables, then the taxonomy information in the <bioentry_qualifier_value> table is useless. Is this reasonable or something that I did not consider? Thanks.

ADD COMMENT
0
Entering edit mode
10.8 years ago
Peter 6.0k

If you don't like the taxonomy override behaviour, please bring this up on the BioSQL mailing list, http://lists.open-bio.org/mailman/listinfo/biosql-l

If you don't think the current Biopython BioSQL documentation is clear enough, please suggest clarifications. Thanks.

ADD COMMENT

Login before adding your answer.

Traffic: 2089 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6