Question: Genbank GI accession format for DAVID input
gravatar for jfo
12 months ago by
jfo10 wrote:


I would like to seek an advice or two with regard to the proper format of GENBANK_GI_ACCESSION for DAVID Function Analysis. I tried these formats:

  1. gi123456
  2. gi|123456
  3. 123456

Sadly, nothing worked. I could not find any examples. I prefer gi accessions for this because all my unigenes of interest have them. I do have some ref seq counterpart and symbol IDs but not all of my unigenes have a ref seq or symbol IDs. And yes, I'm not sure what I'm doing. Any help will be appreciated!

functional analysis david • 346 views
ADD COMMENTlink modified 12 months ago by Istvan Albert ♦♦ 82k • written 12 months ago by jfo10
gravatar for Istvan Albert
12 months ago by
Istvan Albert ♦♦ 82k
University Park, USA
Istvan Albert ♦♦ 82k wrote:

Using gi number is a bad idea. NCIB stopped using them, data is not being released with gi numbers, hence you are guaranteed to operate on outdated information. You may run into various kinds of mysterious errors as well - although the problems with DAVID are simply that it is an atrocious system to begin with.

Having a GI number without an accession number also sounds quite unexpected - the chances that a tool would work with such a data is again much reduced. You can convert gi numbers to accession numbers with entrez direct with

efetch -db nuccore -id 663070995,568815587 -format acc

to produce:


or an even simpler way as stated here:

with a command such as:

curl ',568815587&rettype=acc'

which will produce the same output:


Verify that your gi numbers do indeed lack an accesion number


You could even just make the right URL and paste that into your browser

ADD COMMENTlink modified 12 months ago • written 12 months ago by Istvan Albert ♦♦ 82k

Thank you for the prompt answer. I do have the accessions; however, I am not sure how to use different input types for the analysis in DAVID. For example I have ref (mostly XP_), dbj, gb, or sp for the accessions. How do I convert these accession IDs to a DAVID-"interpret-able" format? I am having a hard time looking for a way to do this. For example, I tried the Retrieve/ID Mapping of uniprotkb but not all my unigenes with gi matched to a uniprot. I do not know how to proceed from there.

ADD REPLYlink written 12 months ago by jfo10

DAVID ought to understand many different types of accession numbers. Try something simple first, use only a subset of the gene names, to get your bearings first, and ensure that it works. If you are not sure what to pick start here

take the 20 gene names from the first column and see if you can make DAVID work.

I would also recommend an alternative tool

and the converter here

ADD REPLYlink written 12 months ago by Istvan Albert ♦♦ 82k

I have the Official Gene Symbols, which actually works. My confusion comes from the use of the unigenes with nr hits but with no gene symbols. Should I just proceed with those which had gene symbols? This is why I was looking for a way to get all these unigenes with nr hits to have other accession numbers (e.g. gene symbols, uniprot) to represent them all. I'm not even sure if this is possible, though.

As I have mentioned, most of my unigenes had protein hits with XP_ but some with gb| or dbj| reference instead. The gi number is the only ID that is present in all my unigenes with nr hits. Curious question: Is it possible to convert all unigenes with nr hits into their corresponding Uniprot or Gene symbol IDs? I'm asking because I could not seem to find XP_ counterpart for those with gb| or dbj| (e.g. gb|ABC87995.1). Or is it okay to proceed with the analysis with just those unigenes with gene symbols? I'm so confused I don't know if my questions are valid.

ADD REPLYlink written 12 months ago by jfo10
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 669 users visited in the last hour