Question: Alternatives to EUtils to get Taxa from GenBank
0
gravatar for irazoqui.matias
2.6 years ago by
irazoqui.matias10 wrote:

Hello.

I have several files (>200) with more than 100 protein sequences each and I want to get the taxonomy of each sequence. My first thought was using BLAST. Since they are quite a few, I used a PERL scrip to "remote blast" them and then extracted the accession of the best hit. I've tried using EUtils to retrieve the taxonomy, but it's really slow. I've also tried to use Bio::LITE::Taxonomy package, but it only uses GI (not the accession), and NCBI doesn't use it anymore. My other thought was a standalone BLAST, but the nr database is really big and making it takes a lot of time.

Does anyone know a better way to get the taxonomy of a protein sequence or from the accession? If not, I'll stick to EUtils.

Thanks!!

genbank eutils taxonomy ncbi • 945 views
ADD COMMENTlink modified 2.6 years ago by Sej Modha4.2k • written 2.6 years ago by irazoqui.matias10

You can download pre-formatted nr database files from NCBI. It is still a big download. I am not sure if NCBI actually includes taxonomy info in their pre-formatted blast indexes.

Have you looked at this solution on Stackoverflow?

ADD REPLYlink written 2.6 years ago by genomax68k
1
gravatar for Sej Modha
2.6 years ago by
Sej Modha4.2k
Glasgow, UK
Sej Modha4.2k wrote:

You can try converting the accession number to taxonomic ID first and then use the taxonomy ID to fetch full taxonomy using the taxdump files.

ADD COMMENTlink written 2.6 years ago by Sej Modha4.2k

I didn't know about the prot.accession2taxid file, it was just what i wanted

ADD REPLYlink written 2.6 years ago by irazoqui.matias10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 803 users visited in the last hour