Alternatives to EUtils to get Taxa from GenBank
1
0
Entering edit mode
7.5 years ago

Hello.

I have several files (>200) with more than 100 protein sequences each and I want to get the taxonomy of each sequence. My first thought was using BLAST. Since they are quite a few, I used a PERL scrip to "remote blast" them and then extracted the accession of the best hit. I've tried using EUtils to retrieve the taxonomy, but it's really slow. I've also tried to use Bio::LITE::Taxonomy package, but it only uses GI (not the accession), and NCBI doesn't use it anymore. My other thought was a standalone BLAST, but the nr database is really big and making it takes a lot of time.

Does anyone know a better way to get the taxonomy of a protein sequence or from the accession? If not, I'll stick to EUtils.

Thanks!!

taxonomy ncbi EUtils genbank • 1.9k views
ADD COMMENT
0
Entering edit mode

You can download pre-formatted nr database files from NCBI. It is still a big download. I am not sure if NCBI actually includes taxonomy info in their pre-formatted blast indexes.

Have you looked at this solution on Stackoverflow?

ADD REPLY
1
Entering edit mode
7.5 years ago
Sej Modha 5.3k

You can try converting the accession number to taxonomic ID first and then use the taxonomy ID to fetch full taxonomy using the taxdump files.

ADD COMMENT
0
Entering edit mode

I didn't know about the prot.accession2taxid file, it was just what i wanted

ADD REPLY

Login before adding your answer.

Traffic: 2724 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6