Question: How To Get Flybase Gene Id According To Kegg Gene Id?
2
gravatar for Dejian
8.1 years ago by
Dejian1.3k
United States
Dejian1.3k wrote:

I got a list of kegg gene ID for fruitfly. Now I want to get these sequences. I think mapping these KEGG gene ID to flybase gene ID and then extracting the sequences from flybase is a good way to do the job. I am sure there is a map between kegg gene id and flybase gene id (see below), however, I do not know where it is. Give me a hint? Thanks!

Dmel_CG10219 <> FBgn0039112

Dmel_CG10320 <> FBgn0034645

kegg identifiers • 3.2k views
ADD COMMENTlink modified 8.1 years ago by Lars Juhl Jensen11k • written 8.1 years ago by Dejian1.3k
4
gravatar for Casey Bergman
8.1 years ago by
Casey Bergman18k
Athens, GA, USA
Casey Bergman18k wrote:

The CG identifiers are not Kegg gene ids but the FlyBase Computed Gene IDs. The CG id nomeclature was inherited from Celera/Berkeley and then gene model names were rationalized into the FlyBase paradigm, where all object types are given a FBxx id, where FB=FlyBase and xx=object type (in this case, gene name).

To get a mapping between CGids <-> FBgns, use biomart, with this query:

ADD COMMENTlink written 8.1 years ago by Casey Bergman18k

Thanks, Casey. The query you provided is right what I want.

ADD REPLYlink written 8.1 years ago by Dejian1.3k
2
gravatar for Lars Juhl Jensen
8.1 years ago by
Copenhagen, Denmark
Lars Juhl Jensen11k wrote:

If you have KEGG gene IDs, and you want to get the sequences, why not simply download the sequences from KEGG and save yourself the painful and error-prone mapping exercise?

You can download them all from the KEGG FTP site: ftp://ftp.genome.jp/pub/kegg/genes/organisms/dme/

ADD COMMENTlink written 8.1 years ago by Lars Juhl Jensen11k

You are right. This is the most convenient way to do my current job. Many thanks, Lars.

ADD REPLYlink written 8.1 years ago by Dejian1.3k
0
gravatar for Joachim
8.1 years ago by
Joachim2.8k
San Francisco, California
Joachim2.8k wrote:

You are accessing the FlyBase database directly, right?

CG10219 and CG10320 are synonyms in FlyBase, which you can use to get the FlyBase gene ID as follows:

SELECT DISTINCT
    f.uniquename
FROM
    feature f,
    synonym s,
    feature_synonym fs
WHERE
    s.name = 'CG10219'
    AND
    s.synonym_id = fs.synonym_id
    AND
    f.feature_id = fs.feature_id
    AND
    f.organism_id = 1;;

Some explanation:

  • you need DISTINCT because there are several mappings in feature_synonym that match the same synonym_id and feature_id, but differ in pub_id (otherwise it will just return the FlyBase ID a couple of times)
  • you need to match organism_id because otherwise you get non-dmel results too
ADD COMMENTlink written 8.1 years ago by Joachim2.8k

No. I am trying to download a batch of sequences from the website of FlyBase. The batch download accepts IDs like CG.However, a CG-like ID ofen corresponds to more than one FBgn ID and only one of them is what I want. Thus, it is inconvenient to use the batch download on the website. Any other method? Thanks.

ADD REPLYlink written 8.1 years ago by Dejian1.3k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1353 users visited in the last hour