Querying on non-canonical bacterial IDs
1
0
Entering edit mode
7 days ago

I am trying to automate a query for non-canonical bacterial protein IDs in NCBI.

The IDs are not standard refseq (e.g. NP_*) but instead start with HDK (e.g. GenBank: HDK9254199.1)

They exist in NCBI (https://www.ncbi.nlm.nih.gov/protein/HDK9254199) but I have not yet been able to recover them programmatically.

In R,

library(rentrez)
search <- entrez_search(db = "protein", term = "HDK9254199")
summary <- entrez_summary(db = "protein", id= "HDK9254199")
search <- entrez_search(db = "nuccore", term = "HDK9254199")
summary <- entrez_summary(db = "nuccore", id= "HDK9254199")

All yield nothing. The summary error our and the searches yield nothing.

In a perfect world my colleague would have used the reference genome. But, they have already ordered an expensive library corresponding to these accession terms.

Is there any way to map these HDK accessions back to canonical refseq terms?

microbes protein accession • 307 views
ADD COMMENT
0
Entering edit mode
7 days ago
Mensur Dlakic ★ 29k

I don't think there are any other records, including what you call canonical, for this protein.

Maybe this will help you, using Entrez Direct:

efetch -id HDK9254199 -db protein -format fasta

The screen output:

>HDK9254199.1 TPA: RluA family pseudouridine synthase [Staphylococcus aureus USA100-NRS382]
METYEFNITDKEQTGMRVDKLLPELNNDWSRNQIQDWIKAGLVVANDKVVKSNYKVKLNDHIVVTEKEVV
EADILPENLNLDIYYEDDDVAVVYKPKGMVVHPSPGHYTNTLVNGLMYQIKNLSGINGEIRPGIVHRIDM
DTSGLLMVAKNDIAHRGLVEQLMDKSVKRKYIALVHGNIPHDYGTIDAPIGRNKNDRQSMAVVDDGKEAV
THFNVLEHFKDYTLVECQLETGRTHQIRVHMKYIGFPLVGDPKYGPKKTLDIGGQALHAGLIGFEHPVTG
EYIERHAELPQDFEDLLDTIRKRDA

Assuming you have all the IDs of interest in IDs.txt:

cat IDs.txt | xargs -i efetch -id {} -db protein -format fasta >> proteins.fas
ADD COMMENT

Login before adding your answer.

Traffic: 3259 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6