Question: Genbank GI accession format for DAVID input
0
gravatar for jfo
9 months ago by
jfo10
jfo10 wrote:

Hi!

I would like to seek an advice or two with regard to the proper format of GENBANK_GI_ACCESSION for DAVID Function Analysis. I tried these formats:

  1. gi123456
  2. gi|123456
  3. 123456

Sadly, nothing worked. I could not find any examples. I prefer gi accessions for this because all my unigenes of interest have them. I do have some ref seq counterpart and symbol IDs but not all of my unigenes have a ref seq or symbol IDs. And yes, I'm not sure what I'm doing. Any help will be appreciated!

functional analysis david • 304 views
ADD COMMENTlink modified 9 months ago by Istvan Albert ♦♦ 81k • written 9 months ago by jfo10
0
gravatar for Istvan Albert
9 months ago by
Istvan Albert ♦♦ 81k
University Park, USA
Istvan Albert ♦♦ 81k wrote:

Using gi number is a bad idea. NCIB stopped using them, data is not being released with gi numbers, hence you are guaranteed to operate on outdated information. You may run into various kinds of mysterious errors as well - although the problems with DAVID are simply that it is an atrocious system to begin with.

Having a GI number without an accession number also sounds quite unexpected - the chances that a tool would work with such a data is again much reduced. You can convert gi numbers to accession numbers with entrez direct with

efetch -db nuccore -id 663070995,568815587 -format acc

to produce:

NM_001178.5
NC_000011.10

or an even simpler way as stated here:

https://ncbiinsights.ncbi.nlm.nih.gov/2016/12/06/converting-gi-numbers-to-accession-version/

with a command such as:

curl 'https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nuccore&id=663070995,568815587&rettype=acc'

which will produce the same output:

NM_001178.5
NC_000011.10

Verify that your gi numbers do indeed lack an accesion number

PS:

You could even just make the right URL and paste that into your browser

ADD COMMENTlink modified 9 months ago • written 9 months ago by Istvan Albert ♦♦ 81k

Thank you for the prompt answer. I do have the accessions; however, I am not sure how to use different input types for the analysis in DAVID. For example I have ref (mostly XP_), dbj, gb, or sp for the accessions. How do I convert these accession IDs to a DAVID-"interpret-able" format? I am having a hard time looking for a way to do this. For example, I tried the Retrieve/ID Mapping of uniprotkb but not all my unigenes with gi matched to a uniprot. I do not know how to proceed from there.

ADD REPLYlink written 9 months ago by jfo10

DAVID ought to understand many different types of accession numbers. Try something simple first, use only a subset of the gene names, to get your bearings first, and ensure that it works. If you are not sure what to pick start here

http://data.biostarhandbook.com/redo/zika/zika-up-regulated.csv

take the 20 gene names from the first column and see if you can make DAVID work.

I would also recommend an alternative tool

https://biit.cs.ut.ee/gprofiler/gost

and the converter here

https://biit.cs.ut.ee/gprofiler/convert

ADD REPLYlink written 9 months ago by Istvan Albert ♦♦ 81k

I have the Official Gene Symbols, which actually works. My confusion comes from the use of the unigenes with nr hits but with no gene symbols. Should I just proceed with those which had gene symbols? This is why I was looking for a way to get all these unigenes with nr hits to have other accession numbers (e.g. gene symbols, uniprot) to represent them all. I'm not even sure if this is possible, though.

As I have mentioned, most of my unigenes had protein hits with XP_ but some with gb| or dbj| reference instead. The gi number is the only ID that is present in all my unigenes with nr hits. Curious question: Is it possible to convert all unigenes with nr hits into their corresponding Uniprot or Gene symbol IDs? I'm asking because I could not seem to find XP_ counterpart for those with gb| or dbj| (e.g. gb|ABC87995.1). Or is it okay to proceed with the analysis with just those unigenes with gene symbols? I'm so confused I don't know if my questions are valid.

ADD REPLYlink written 9 months ago by jfo10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 786 users visited in the last hour