Diamond blastp output

0

Entering edit mode

4.8 years ago

ARich ▴ 130

Dear Biostar user,

I have a question regarding diamond output. I ran diamond blastp on my contigs against NR database. After this i used diamond view to convert m8 format. In this m8 file, the subject seqid is something like "WP_129184883.1" although i was expecting the genbank ID gi|...|. Can someone explain me why I have refseq protein id and how can i convert it to genbank IDS?

Thank you in advance! Best, AR

sequence assembly • 3.1k views

ADD COMMENT • link 4.8 years ago by ARich ▴ 130

0

Entering edit mode

Did you try searching on the forum (or on Google) for "convert accession to genbank id"?

ADD REPLY • link 4.8 years ago by Ram 43k

0

Entering edit mode

Yes i did try R package called "rentrez"

search <- entrez_search(db="protein", term="WP_129184883[Accn]")
(links <- entrez_link(dbfrom="protein", db="nuccore", id=search$ids)
links$links$protein_nuccore_wp

This provides me taxid. But I am looking for something where the diamond blastp output can be changed to geneID (gi|..|) instead of Refseq protein id (WP_129184883.1).

Thanks

ADD REPLY • link 4.8 years ago by ARich ▴ 130

0

Entering edit mode

NCBI is deprecating gi numbers, so you should probably go with the accession, which you already have. If you still wish to get them, you can use eutils (or reutils in R). Searching for your protein gives the gi in the Id field:

https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=protein&term=WP_129184883[Accn]

You can also fetch and parse ASN.1 format records, which will have gi information for all entries that have a gi entry.

ADD REPLY • link 4.8 years ago by Ram 43k

0

Entering edit mode

What do you want to do this point onwards? As @Ram already indicated gi identifiers are deprecated for external use (by people like us).

ADD REPLY • link 4.8 years ago by GenoMax 142k

Login before adding your answer.