Question: Diamond blastp output
gravatar for ARich
12 months ago by
United States
ARich90 wrote:

Dear Biostar user,

I have a question regarding diamond output. I ran diamond blastp on my contigs against NR database. After this i used diamond view to convert m8 format. In this m8 file, the subject seqid is something like "WP_129184883.1" although i was expecting the genbank ID gi|...|. Can someone explain me why I have refseq protein id and how can i convert it to genbank IDS?

Thank you in advance! Best, AR

sequence assembly • 712 views
ADD COMMENTlink written 12 months ago by ARich90

Did you try searching on the forum (or on Google) for "convert accession to genbank id"?

ADD REPLYlink written 12 months ago by RamRS27k

Yes i did try R package called "rentrez"

search <- entrez_search(db="protein", term="WP_129184883[Accn]")
(links <- entrez_link(dbfrom="protein", db="nuccore", id=search$ids)

This provides me taxid. But I am looking for something where the diamond blastp output can be changed to geneID (gi|..|) instead of Refseq protein id (WP_129184883.1).


ADD REPLYlink written 12 months ago by ARich90

NCBI is deprecating gi numbers, so you should probably go with the accession, which you already have. If you still wish to get them, you can use eutils (or reutils in R). Searching for your protein gives the gi in the Id field:[Accn]

You can also fetch and parse ASN.1 format records, which will have gi information for all entries that have a gi entry.

ADD REPLYlink modified 12 months ago • written 12 months ago by RamRS27k

What do you want to do this point onwards? As @Ram already indicated gi identifiers are deprecated for external use (by people like us).

ADD REPLYlink written 12 months ago by genomax85k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1479 users visited in the last hour