When I try to extract annotations['taxonomy'] information from BioSQL entries, the taxonomy information (record.annotations['taxonomy']) was error to show the organism name of the entry, but not the exact taxonomy information. Is there anything wrong of my operation or the bug of BioPython? Thanks for any reply.
For example: BioSQL - bioentry_qualifier_value
bioentry_id term_id value rank
1 3 Yellowtail ascites virus 0
1 4 Viruses 1
1 4 dsRNA viruses 2
1 4 Birnaviridae 3
1 4 Aquabirnavirus 4
BioPython:
>>> records[0].annotations['taxonomy']
['Yellowtail ascites virus']
>>> records[0].annotations
{'ncbi_taxid': 59816L, 'data_file_division': 'VRL', 'source': ['Yellowtail ascites virus'], 'references': [Reference(title='cDNA cloning of yellowtail ascites virus segment A and expression of epitope region on VP2', ...), Reference(title='Direct Submission', ...)], 'sequence_version': ['1'], 'keywords': [''], 'taxonomy': ['Yellowtail ascites virus'], 'date': ['25-SEP-1997'], 'organism': 'Yellowtail ascites virus', 'gi': '2073114', 'accessions': ['AB003359']}
>>>
Note BioSQL typically stores a copy of the NCBI taxonomy and this overrides any lineage given in the original GenBank file.
Where is the error? Your question is not clear to me.
Did you pre-load the NBCI taxonomy into the BioSQL database? Or, were you online so that Biopython could pull the lineage from the NCBI Entrez API? How did you populate this BioSQL database (e.g. using Biopython or BioPerl)?
Thanks for your reply. I think it's the problem of NCBI taxonomy. I have imported the BioSQL data from a Genbank file using Biopython without loading the NCBI taxonomy. And when Biopython extracts the data from BioSQL, the taxonomy information is extracted from the "taxon" and "taxon_name" table but not imported "bioentry_qualifier_value" table, which means that I need to load NCBI taxonomy first. However, I want to know why Biopython extract taxonomy information from NCBI taxonomy but not the bioentry qualifiers? Thanks.