The source of NCBI NR library
0
1
Entering edit mode
5.0 years ago
huangjs2017 ▴ 10

I need obtain taxonomy information(taxon id) of NCBI NR library by protein accession number. I find two useful files prot.accession2taxid.gz and pdb.accession2taxid.gz in https://ftp.ncbi.nlm.nih.gov/pub/taxonomy/accession2taxid/. However, some accession numbers still cannot fetch taxonomy information. Those accession numbers mainly are consist of the following categories:

  1. The NCBI show "Record removed", like "AYN07615.1". Why did the records removed appear in the NR library?

  2. Some accession numbers from unknown resources. For example, pir||S69889 and prf||1403304A.

  3. Some accession numbers from PDB, but those cannot be found in pdb.accession2taxid.gz. For example 6F1U_FF

how can I obtain taxonomy information for those special accession numbers?

sequence assembly next-gen • 1.8k views
ADD COMMENT
0
Entering edit mode

which version of blast/nr are you using ( local copy?) ? Or are you simply looking for the list of all taxonomy for each protein?

ADD REPLY
0
Entering edit mode

I download the NR library from https://ftp.ncbi.nih.gov/blast/db/FASTA/nr.gz.

And it can be said that I am simply looking for the list of all taxonomy for each protein. But I cannot obtain all taxonomy for each protein from the headers in NR fasta file because of some non-standard naming and possible duplicate taxa name (a taxon name can map multiple taxa ids) .

ADD REPLY
0
Entering edit mode

the 'removed' record might be because the version you can download is always a little bit behind compared to the online version (== normally you can check when it has been removed, and I would not be surprised if dates after the time you downloaded nr from NCBI ).

PIR and PRF are not unknown resources, lesser known OK. Normally they both (or at least PIR) is nowadays included in UNIprot

for the PDB one you have to search for 6F1U I think (the _FF denotes the chain )

ADD REPLY

Login before adding your answer.

Traffic: 2516 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6