Entering edit mode
7.8 years ago
Juke34
8.6k
Does someone know if there is a way to retrieve the species name / taxid of organisms that have whole genome annotation ? I would like to be able to do that in perl. My first thought was to look at NCBI through a Bioperl method...but I didn't find.
Any help is appreciated.
Could you be more specifically what you mean with whole genome annotation? The human genome is probably one of the best studied and annotated genome, but I wouldn't claim that the whole genome is annotated.
Yes sure, I will try to be more precise. By whole genome annotation I mean a whole genome assembly where an annotation exists. I consider a genome assembly or a genome annotation as "whole/complete" if an effort in that sens has been done. I don't take into account if the annotation is good or bad, if some genes are missing or if the assembly is 80-90-99% complete.
Btw, I don't know yet any people ready to claim that a genome is really "complete" at the assembly and the annotation level... even for the human one.
Is this anything you can use? http://ensemblgenomes.org/info/genomes and http://www.ensembl.org/info/about/species.html You can download the metadata table in various formats and determine your criteria...
Yes thank you. It could be a solution. Even if I wanted to avoid to use the Ensembl API I think it's a solution. But I'm still not convince... the INSDC databases as NCBI are complete but I doubt about Ensembl. I guess Ensembl contains a sub-part of whole genomes...