Question: ensembl biomart: why some genes are not found?
0
gravatar for biocyberman
3.9 years ago by
biocyberman760
Denmark
biocyberman760 wrote:

I am trying to retrieve exon coordinates for all genes in Agilent's Clinical Research Exome via Biomart. Some genes are not found in the results. For example, this AK6 is not found. Ensembl biomart seems to mistake AK6 for TAF9. Genecards: http://www.genecards.org/cgi-bin/carddisp.pl?gene=AK6 also does the same. While in the in Entrez says differently:

 

Entrez Gene summary for AK6 Gene:

This gene encodes a protein that belongs to the adenylate kinase family of enzymes. The protein has a nuclear

localization and contains Walker A (P-loop) and Walker B motifs and a metal-coordinating residue. The protein may

be involved in regulation of Cajal body formation. In human, AK6 and TAF9 (GeneID: 6880) are two distinct genes

that share 5' exons. Alternative splicing results in multiple transcript variants. (provided by RefSeq, Sep 2013)

genecards biomart ensembl gene • 1.2k views
ADD COMMENTlink modified 3.9 years ago by Emily_Ensembl18k • written 3.9 years ago by biocyberman760
2
gravatar for Emily_Ensembl
3.9 years ago by
Emily_Ensembl18k
EMBL-EBI
Emily_Ensembl18k wrote:

This looks like something has gone wrong at our end. We're looking into it.

Update: This has come into us via HGNC. Now on the case to HGNC.

Update: This was based on some old HGNC data that has since been fixed. It will be fixed for Ensembl release 81 (due in July) – I'm afraid the update missed release 80, due this month.

ADD COMMENTlink modified 3.9 years ago • written 3.9 years ago by Emily_Ensembl18k

@Emily_Ensembl: I actually believe otherwise. In Ensembl, TAF9 has two ENSG IDs: ENSG00000085231, ENSG00000273841; while on HGNC, they have one ID for each gene:

AK6: ENSG00000085231; Entrez:102157402; HGNC:49151; Link: http://www.genenames.org/cgi-bin/gene_symbol_report?hgnc_id=HGNC:49151

TAF9: ENSG00000273841; Entrez: 6880; HGNC:11542; Link: http://www.genenames.org/cgi-bin/gene_symbol_report?hgnc_id=HGNC:11542

So something happened on Ensembl side!

I think it has something to do with mapping interval/coordinates back to gene names.

ADD REPLYlink modified 3.9 years ago • written 3.9 years ago by biocyberman760

What's happened is our links to HGNC come in via RefSeq and their links to RefSeq are wrong, so we've pulled in the wrong HGNCs. As I said, we're on the case.

ADD REPLYlink written 3.9 years ago by Emily_Ensembl18k

Got it. I read too fast, sorry.

ADD REPLYlink written 3.9 years ago by biocyberman760

Thanks for the update @Emily_Ensembl. I currently in urgent need of a GTF files of GRCh37, and Rat Rn6 releases. Could you point out how I may get/make them without the possible problem with HGNC data, and before the release 81?

Thanks 

ADD REPLYlink modified 3.9 years ago by Emily_Ensembl18k • written 3.9 years ago by biocyberman760
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 787 users visited in the last hour