Playing With Mysql/Ensembl (I): Mapping 'Gene' To 'Xref'
3
0
Entering edit mode
11.1 years ago

Hi all, I'm playing with the anonymous mysql server for ENSEMBL

mysql -h ensembldb.ensembl.org --port 5306  -u anonymous -D homo_sapiens_core_57_37b

and I'm trying to do some reverse engineering to see how the tables are linked (I've generated a DOT file mapping the foreign keys). I'm playing with 'my' protein ROXAN ( http://www.ensembl.org/Homo_sapiens/Gene/Summary?g=ENSG00000100403 )

I can get the record in the gene table for this protein:

mysql> select * from gene_stable_id as GSI,gene as G
       where GSI.stable_id ="ENSG00000100403" and
       G.gene_id=GSI.gene_id\G
*************************** 1. row ***************************
                gene_id: 55916
              stable_id: ENSG00000100403
                version: 9
           created_date: 2008-04-29 11:17:41
          modified_date: 2009-08-05 14:27:16
                gene_id: 55916
                biotype: protein_coding
            analysis_id: 8047
          seq_region_id: 27510
       seq_region_start: 41697526
         seq_region_end: 41756151
      seq_region_strand: 1
        display_xref_id: 16961747
                 source: ensembl
                 status: KNOWN
            description: zinc finger CCCH-type containing 7B [Source:HGNC Symbol;Acc:30869]
             is_current: 1
canonical_transcript_id: 146331
   canonical_annotation: NULL
1 row in set (0.07 sec)

I can also retrieve 5 records in XREF for the keyword "ROXAN".

mysql> select * from external_synonym as S,
       xref as X ,
       external_db as D
       where D.external_db_id=X.external_db_id and
       S.synonym="ROXAN"
       and X.xref_id=S.xref_id \G
*************************** 1. row ***************************
               xref_id: 298451
               synonym: RoXaN
               xref_id: 298451
        external_db_id: 2310
         dbprimary_acc: OTTHUMT00000337751
         display_label: OTTHUMT00000337751
               version: 2
           description: NULL
             info_type: NULL
             info_text: NULL
        external_db_id: 2310
               db_name: Vega_transcript
            db_release: 1
                status: KNOWNXREF
dbprimary_acc_linkable: 1
display_label_linkable: 0
              priority: 5
       db_display_name: Vega transcript
                  type: MISC
     secondary_db_name: NULL
    secondary_db_table: NULL
           description: NULL
*************************** 2. row ***************************
               xref_id: 16506778
               synonym: RoXaN
               xref_id: 16506778
        external_db_id: 1100
         dbprimary_acc: 30869
         display_label: ZC3H7B
               version: 0
           description: zinc finger CCCH-type containing 7B
             info_type: DIRECT
             info_text: Generated via havana
        external_db_id: 1100
               db_name: HGNC
            db_release: 1
                status: KNOWNXREF
dbprimary_acc_linkable: 1
display_label_linkable: 1
              priority: 100
       db_display_name: HGNC Symbol
                  type: PRIMARY_DB_SYNONYM
     secondary_db_name: NULL
    secondary_db_table: NULL
           description: NULL
*************************** 3. row ***************************
               xref_id: 15396769
               synonym: RoXaN
               xref_id: 15396769
        external_db_id: 1300
         dbprimary_acc: 23264
         display_label: ZC3H7B
               version: 0
           description: zinc finger CCCH-type containing 7B
             info_type: DEPENDENT
             info_text: NULL
        external_db_id: 1300
               db_name: EntrezGene
            db_release: 1
                status: KNOWNXREF
dbprimary_acc_linkable: 1
display_label_linkable: 0
              priority: 10
       db_display_name: EntrezGene
                  type: MISC
     secondary_db_name: NULL
    secondary_db_table: NULL
           description: NULL
*************************** 4. row ***************************
               xref_id: 16966653
               synonym: RoXaN
               xref_id: 16966653
        external_db_id: 12405
         dbprimary_acc: MPRIP-205
         display_label: MPRIP-205
               version: 0
           description: NULL
             info_type: MISC
             info_text: via havana
        external_db_id: 12405
               db_name: HGNC_automatic_transcript
            db_release: 1
                status: KNOWNXREF
dbprimary_acc_linkable: 1
display_label_linkable: 0
              priority: 5
       db_display_name: HGNC (automatic)
                  type: MISC
     secondary_db_name: NULL
    secondary_db_table: NULL
           description: NULL
*************************** 5. row ***************************
               xref_id: 16961747
               synonym: RoXaN
               xref_id: 16961747
        external_db_id: 12300
         dbprimary_acc: 30869
         display_label: ZC3H7B
               version: 0
           description: NULL
             info_type: MISC
             info_text: via havana
        external_db_id: 12300
               db_name: HGNC_curated_gene
            db_release: 1
                status: KNOWNXREF
dbprimary_acc_linkable: 1
display_label_linkable: 0
              priority: 5
       db_display_name: HGNC (curated)
                  type: MISC
     secondary_db_name: NULL
    secondary_db_table: NULL
           description: NULL
5 rows in set (0.17 sec)

I wish I could find the gene(s) from the keyword ROXAN.

OK, here I got 5 'xref' records and I know that gene is linked to xref via gene.displayxrefid=xref.xref_id . Xref can also have links to some other db (e.g. transcript), so I guess there is something in xref, something like a flag, telling that 'this' xref record is pointing to gene.

Is it true ? What is that flag ?

Thanks

Pierre

ensembl mysql • 3.2k views
ADD COMMENT
2
Entering edit mode
11.1 years ago

OK, got an answer from Jan Aerts on the ensembl mailing list:

The object_xref table is what you're looking for. This table makes the connection from the xref table to one of the following other tables: gene, transcript and translation. Example rows:

mysql> select * from object_xref limit 2;
+----------------+------------+---------------------+----------+--------------------+-------------+
| object_xref_id | ensembl_id | ensembl_object_type | xref_id  | linkage_annotation | analysis_id |
+----------------+------------+---------------------+----------+--------------------+-------------+
|       12799800 |     192551 | Transcript          | 11798151 | NULL               |        NULL |
|       12799799 |     192551 | Transcript          | 11798150 | NULL               |        NULL |
+----------------+------------+---------------------+----------+--------------------+-------------+
2 rows in set (0.14 sec)
  

Hope this helps, jan.

ADD COMMENT
1
Entering edit mode
11.1 years ago
Yuri ★ 1.6k

You have the HUGO gene name in display_label field.

If you want to show also Ensembl id, you can connect both queries with

... and X.xref_id = G.display_xref_id

I suppose stable id corresponds to gene in HGNC_curated_gene external db.

ADD COMMENT
0
Entering edit mode
11.1 years ago

You have a line there that reads:

db_name: HGNC_curated_gene

Maybe the information of interest is represented by the database name.

ADD COMMENT

Login before adding your answer.

Traffic: 1732 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6